Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos
By observing a scene and listening to corresponding audio cues, humans can easily recognize where the sound is. To achieve such cross-modal perception on machines, existing methods take advantage of the maps obtained by interpolation operations to localize the sound source. As semantic object-level...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 7 vom: 01. Juni, Seite 4896-4907
|
1. Verfasser: |
Xuan, Hanyu
(VerfasserIn) |
Weitere Verfasser: |
Wu, Zhiliang,
Yang, Jian,
Jiang, Bo,
Luo, Lei,
Alameda-Pineda, Xavier,
Yan, Yan |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article |