Semantic and Relation Modulation for Audio-Visual Event Localization

We study the problem of localizing audio-visual events that are both audible and visible in a video. Existing works focus on encoding and aligning audio and visual features at the segment level while neglecting informative correlation between segments of the two modalities and between multi-scale ev...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 6 vom: 16. Juni, Seite 7711-7725
1. Verfasser: Wang, Hao (VerfasserIn)
Weitere Verfasser: Zha, Zheng-Jun, Li, Liang, Chen, Xuejin, Luo, Jiebo
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article