3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images

Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to the substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. Different from the existing learning-based monocular RGB-input approaches t...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 11 vom: 08. Nov., Seite 3739-3753
1. Verfasser:	Cai, Yujun (VerfasserIn)
Weitere Verfasser:	Ge, Liuhao, Cai, Jianfei, Thalmann, Nadia Magnenat, Yuan, Junsong
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2021
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000naa a22002652 4500
001	NLM309791758
003	DE-627
005	20231225134705.0
007	cr uuu---uuuuu
008	231225s2021 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2020.2993627 \|2 doi
028	5	2	\|a pubmed24n1032.xml
035			\|a (DE-627)NLM309791758
035			\|a (NLM)32396073
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Cai, Yujun \|e verfasserin \|4 aut
245	1	0	\|a 3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 03.11.2021
500			\|a Date Revised 03.11.2021
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to the substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. Different from the existing learning-based monocular RGB-input approaches that require accurate 3D annotations for training, we propose to leverage the depth images that can be easily obtained from commodity RGB-D cameras during training, while during testing we take only RGB inputs for 3D joint predictions. In this way, we alleviate the burden of the costly 3D annotations in real-world dataset. Particularly, we propose a weakly-supervised method, adaptating from fully-annotated synthetic dataset to weakly-labeled real-world single RGB dataset with the aid of a depth regularizer, which serves as weak supervision for 3D pose prediction. To further exploit the physical structure of 3D hand pose, we present a novel CVAE-based statistical framework to embed the pose-specific subspace from RGB images, which can then be used to infer the 3D hand joint locations. Extensive experiments on benchmark datasets validate that our proposed approach outperforms baselines and state-of-the-art methods, which proves the effectiveness of the proposed depth regularizer and the CVAE-based framework
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Ge, Liuhao \|e verfasserin \|4 aut
700	1		\|a Cai, Jianfei \|e verfasserin \|4 aut
700	1		\|a Thalmann, Nadia Magnenat \|e verfasserin \|4 aut
700	1		\|a Yuan, Junsong \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 43(2021), 11 vom: 08. Nov., Seite 3739-3753 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:43 \|g year:2021 \|g number:11 \|g day:08 \|g month:11 \|g pages:3739-3753
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2020.2993627 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 43 \|j 2021 \|e 11 \|b 08 \|c 11 \|h 3739-3753