BM3 E : discriminative density propagation for visual tracking

We introduce BM3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent probabilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models e...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 29(2007), 11 vom: 11. Nov., Seite 2030-44
1. Verfasser: Sminchisescu, Cristian (VerfasserIn)
Weitere Verfasser: Kanaujia, Atul, Metaxas, Dimitris N
Format: Aufsatz
Sprache:English
Veröffentlicht: 2007
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, U.S. Gov't, Non-P.H.S.
Beschreibung
Zusammenfassung:We introduce BM3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent probabilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models estimated with Kalman filtering or particle filtering. Instead of inverting a non-linear generative observation model at run-time, we learn to cooperatively predict complex state distributions directly from descriptors that encode image observations - typically bag-of-feature global image histograms or descriptors computed over regular spatial grids. These are integrated in a conditional graphical model in order to enforce temporal smoothness constraints and allow a principled management of uncertainty. The algorithms combine sparsity, mixture modeling, and non-linear dimensionality reduction for efficient computation in high-dimensional continuous state spaces. The combined system automatically self-initializes and recovers from failure. The research has three contributions: (1) We establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) We propose flexible supervised and unsupervised algorithms for learning feedforward, multivalued contextual mappings (multimodal state distributions) based on compact, conditional Bayesian mixture of experts models; (3) We validate the framework empirically for the reconstruction of 3d human motion in monocular video sequences. Our tests on both real and motion capture-based sequences show significant performance gains with respect to competing nearest-neighbor, regression, and structured prediction methods
Beschreibung:Date Completed 13.12.2007
Date Revised 12.09.2007
published: Print
Citation Status MEDLINE
ISSN:1939-3539