Semantic and Relation Modulation for Audio-Visual Event Localization
We study the problem of localizing audio-visual events that are both audible and visible in a video. Existing works focus on encoding and aligning audio and visual features at the segment level while neglecting informative correlation between segments of the two modalities and between multi-scale ev...
Ausführliche Beschreibung
Bibliographische Detailangaben
| Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 6 vom: 16. Juni, Seite 7711-7725
|
| 1. Verfasser: |
Wang, Hao
(VerfasserIn) |
| Weitere Verfasser: |
Zha, Zheng-Jun,
Li, Liang,
Chen, Xuejin,
Luo, Jiebo |
| Format: | Online-Aufsatz
|
| Sprache: | English |
| Veröffentlicht: |
2023
|
| Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
| Schlagworte: | Journal Article |