Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around an...
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 40(2018), 5 vom: 19. Mai, Seite 1086-1099 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2018
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
Schlagworte: | Journal Article Research Support, Non-U.S. Gov't |
Online verfügbar |
Volltext |