Towards Fully Mobile 3D Face, Body, and Environment Capture Using Only Head-worn Cameras

We propose a new approach for 3D reconstruction of dynamic indoor and outdoor scenes in everyday environments, leveraging only cameras worn by a user. This approach allows 3D reconstruction of experiences at any location and virtual tours from anywhere. The key innovation of the proposed ego-centric...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on visualization and computer graphics. - 1996. - 24(2018), 11 vom: 10. Nov., Seite 2993-3004
1. Verfasser:	Cha, Young-Woon (VerfasserIn)
Weitere Verfasser:	Price, True, Wei, Zhen, Lu, Xinran, Rewkowski, Nicholas, Chabra, Rohan, Qin, Zihe, Kim, Hyounghun, Su, Zhaoqi, Liu, Yebin, Ilie, Adrian, State, Andrei, Xu, Zhenlin, Frahm, Jan-Michael, Fuchs, Henry
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2018
Zugriff auf das übergeordnete Werk:	IEEE transactions on visualization and computer graphics
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Beschreibung
Zusammenfassung:	We propose a new approach for 3D reconstruction of dynamic indoor and outdoor scenes in everyday environments, leveraging only cameras worn by a user. This approach allows 3D reconstruction of experiences at any location and virtual tours from anywhere. The key innovation of the proposed ego-centric reconstruction system is to capture the wearer's body pose and facial expression from near-body views, e.g. cameras on the user's glasses, and to capture the surrounding environment using outward-facing views. The main challenge of the ego-centric reconstruction, however, is the poor coverage of the near-body views - that is, the user's body and face are observed from vantage points that are convenient for wear but inconvenient for capture. To overcome these challenges, we propose a parametric-model-based approach to user motion estimation. This approach utilizes convolutional neural networks (CNNs) for near-view body pose estimation, and we introduce a CNN-based approach for facial expression estimation that combines audio and video. For each time-point during capture, the intermediate model-based reconstructions from these systems are used to re-target a high-fidelity pre-scanned model of the user. We demonstrate that the proposed self-sufficient, head-worn capture system is capable of reconstructing the wearer's movements and their surrounding environment in both indoor and outdoor situations without any additional views. As a proof of concept, we show how the resulting 3D-plus-time reconstruction can be immersively experienced within a virtual reality system (e.g., the HTC Vive). We expect that the size of the proposed egocentric capture-and-reconstruction system will eventually be reduced to fit within future AR glasses, and will be widely useful for immersive 3D telepresence, virtual tours, and general use-anywhere 3D content creation
Beschreibung:	Date Completed 17.09.2019 Date Revised 10.12.2019 published: Print-Electronic Citation Status MEDLINE
ISSN:	1941-0506
DOI:	10.1109/TVCG.2018.2868527