BM3 E : discriminative density propagation for visual tracking
We introduce BM3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent probabilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models e...
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 29(2007), 11 vom: 11. Nov., Seite 2030-44 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , |
Format: | Aufsatz |
Sprache: | English |
Veröffentlicht: |
2007
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
Schlagworte: | Journal Article Research Support, U.S. Gov't, Non-P.H.S. |
Zusammenfassung: | We introduce BM3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent probabilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models estimated with Kalman filtering or particle filtering. Instead of inverting a non-linear generative observation model at run-time, we learn to cooperatively predict complex state distributions directly from descriptors that encode image observations - typically bag-of-feature global image histograms or descriptors computed over regular spatial grids. These are integrated in a conditional graphical model in order to enforce temporal smoothness constraints and allow a principled management of uncertainty. The algorithms combine sparsity, mixture modeling, and non-linear dimensionality reduction for efficient computation in high-dimensional continuous state spaces. The combined system automatically self-initializes and recovers from failure. The research has three contributions: (1) We establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) We propose flexible supervised and unsupervised algorithms for learning feedforward, multivalued contextual mappings (multimodal state distributions) based on compact, conditional Bayesian mixture of experts models; (3) We validate the framework empirically for the reconstruction of 3d human motion in monocular video sequences. Our tests on both real and motion capture-based sequences show significant performance gains with respect to competing nearest-neighbor, regression, and structured prediction methods |
---|---|
Beschreibung: | Date Completed 13.12.2007 Date Revised 12.09.2007 published: Print Citation Status MEDLINE |
ISSN: | 1939-3539 |