3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images

Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to the substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. Different from the existing learning-based monocular RGB-input approaches t...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 11 vom: 08. Nov., Seite 3739-3753
1. Verfasser: Cai, Yujun (VerfasserIn)
Weitere Verfasser: Ge, Liuhao, Cai, Jianfei, Thalmann, Nadia Magnenat, Yuan, Junsong
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2021
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't
LEADER 01000naa a22002652 4500
001 NLM309791758
003 DE-627
005 20231225134705.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2020.2993627  |2 doi 
028 5 2 |a pubmed24n1032.xml 
035 |a (DE-627)NLM309791758 
035 |a (NLM)32396073 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Cai, Yujun  |e verfasserin  |4 aut 
245 1 0 |a 3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 03.11.2021 
500 |a Date Revised 03.11.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to the substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. Different from the existing learning-based monocular RGB-input approaches that require accurate 3D annotations for training, we propose to leverage the depth images that can be easily obtained from commodity RGB-D cameras during training, while during testing we take only RGB inputs for 3D joint predictions. In this way, we alleviate the burden of the costly 3D annotations in real-world dataset. Particularly, we propose a weakly-supervised method, adaptating from fully-annotated synthetic dataset to weakly-labeled real-world single RGB dataset with the aid of a depth regularizer, which serves as weak supervision for 3D pose prediction. To further exploit the physical structure of 3D hand pose, we present a novel CVAE-based statistical framework to embed the pose-specific subspace from RGB images, which can then be used to infer the 3D hand joint locations. Extensive experiments on benchmark datasets validate that our proposed approach outperforms baselines and state-of-the-art methods, which proves the effectiveness of the proposed depth regularizer and the CVAE-based framework 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
700 1 |a Ge, Liuhao  |e verfasserin  |4 aut 
700 1 |a Cai, Jianfei  |e verfasserin  |4 aut 
700 1 |a Thalmann, Nadia Magnenat  |e verfasserin  |4 aut 
700 1 |a Yuan, Junsong  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 43(2021), 11 vom: 08. Nov., Seite 3739-3753  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:43  |g year:2021  |g number:11  |g day:08  |g month:11  |g pages:3739-3753 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2020.2993627  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 43  |j 2021  |e 11  |b 08  |c 11  |h 3739-3753