Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around an...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 40(2018), 5 vom: 19. Mai, Seite 1086-1099
1. Verfasser: Gebru, Israel D (VerfasserIn)
Weitere Verfasser: Ba, Sileye, Li, Xiaofei, Horaud, Radu
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2018
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't