Learning Energy-Based Spatial-Temporal Generative ConvNets for Dynamic Patterns

Video sequences contain rich dynamic patterns, such as dynamic texture patterns that exhibit stationarity in the temporal domain, and action patterns that are non-stationary in either spatial or temporal domain. We show that an energy-based spatial-temporal generative ConvNet can be used to model an...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 2 vom: 06. Feb., Seite 516-531
Auteur principal: Xie, Jianwen (Auteur)
Autres auteurs: Zhu, Song-Chun, Wu, Ying Nian
Format: Article en ligne
Langue:English
Publié: 2021
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article
LEADER 01000caa a22002652 4500
001 NLM300350473
003 DE-627
005 20250225200147.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2019.2934852  |2 doi 
028 5 2 |a pubmed25n1001.xml 
035 |a (DE-627)NLM300350473 
035 |a (NLM)31425020 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Xie, Jianwen  |e verfasserin  |4 aut 
245 1 0 |a Learning Energy-Based Spatial-Temporal Generative ConvNets for Dynamic Patterns 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 11.01.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Video sequences contain rich dynamic patterns, such as dynamic texture patterns that exhibit stationarity in the temporal domain, and action patterns that are non-stationary in either spatial or temporal domain. We show that an energy-based spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns. The model defines a probability distribution on the video sequence, and the log probability is defined by a spatial-temporal ConvNet that consists of multiple layers of spatial-temporal filters to capture spatial-temporal patterns of different scales. The model can be learned from the training video sequences by an "analysis by synthesis" learning algorithm that iterates the following two steps. Step 1 synthesizes video sequences from the currently learned model. Step 2 then updates the model parameters based on the difference between the synthesized video sequences and the observed training sequences. We show that the learning algorithm can synthesize realistic dynamic patterns. We also show that it is possible to learn the model from incomplete training sequences with either occluded pixels or missing frames, so that model learning and pattern completion can be accomplished simultaneously 
650 4 |a Journal Article 
700 1 |a Zhu, Song-Chun  |e verfasserin  |4 aut 
700 1 |a Wu, Ying Nian  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 43(2021), 2 vom: 06. Feb., Seite 516-531  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:43  |g year:2021  |g number:2  |g day:06  |g month:02  |g pages:516-531 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2019.2934852  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 43  |j 2021  |e 2  |b 06  |c 02  |h 516-531