Multistream articulatory feature-based models for visual speech recognition

We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulato...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 31(2009), 9 vom: 15. Sept., Seite 1700-7
1. Verfasser: Saenko, Kate (VerfasserIn)
Weitere Verfasser: Livescu, Karen, Glass, James, Darrell, Trevor
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2009
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Evaluation Study Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.
Beschreibung
Zusammenfassung:We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulatory feature classifiers provides input to the DBN, in the form of either virtual evidence (VE) (scaled likelihoods) or raw classifier margin outputs. We present experiments on two tasks, a medium-vocabulary word-ranking task and a small-vocabulary phrase recognition task. We show that articulatory feature-based models outperform baseline models, and we study several aspects of the models, such as the effects of allowing articulatory asynchrony, of using dictionary-based versus whole-word models, and of incorporating classifier outputs via virtual evidence versus alternative observation models
Beschreibung:Date Completed 06.10.2009
Date Revised 10.12.2019
published: Print
Citation Status MEDLINE
ISSN:1939-3539
DOI:10.1109/TPAMI.2008.303