PoseBERT : A Generic Transformer Module for Temporal 3D Human Modeling

Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currentl...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 11 vom: 12. Nov., Seite 12798-12815
1. Verfasser: Baradel, Fabien (VerfasserIn)
Weitere Verfasser: Bregier, Romain, Groueix, Thibault, Weinzaepfel, Philippe, Kalantidis, Yannis, Rogez, Gregory
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM355204436
003 DE-627
005 20231226063842.0
007 cr uuu---uuuuu
008 231226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2022.3216899  |2 doi 
028 5 2 |a pubmed24n1183.xml 
035 |a (DE-627)NLM355204436 
035 |a (NLM)37015699 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Baradel, Fabien  |e verfasserin  |4 aut 
245 1 0 |a PoseBERT  |b A Generic Transformer Module for Temporal 3D Human Modeling 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 04.10.2023 
500 |a Date Revised 13.10.2023 
500 |a published: Print-Electronic 
500 |a Citation Status MEDLINE 
520 |a Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can be applied to several tasks such as pose refinement, future pose prediction or motion completion without finetuning. Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances, while its low computational cost allows us to use it in a real-time demo for smoothly animating a robotic hand via a webcam. Test code and models are available at https://github.com/naver/posebert 
650 4 |a Journal Article 
700 1 |a Bregier, Romain  |e verfasserin  |4 aut 
700 1 |a Groueix, Thibault  |e verfasserin  |4 aut 
700 1 |a Weinzaepfel, Philippe  |e verfasserin  |4 aut 
700 1 |a Kalantidis, Yannis  |e verfasserin  |4 aut 
700 1 |a Rogez, Gregory  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 45(2023), 11 vom: 12. Nov., Seite 12798-12815  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:45  |g year:2023  |g number:11  |g day:12  |g month:11  |g pages:12798-12815 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2022.3216899  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 45  |j 2023  |e 11  |b 12  |c 11  |h 12798-12815