Learning View Synthesis for Desktop Telepresence with Few RGBD Cameras

Recent telepresence systems have shown significant improvements in quality compared to prior systems. However, they struggle to achieve both low cost and high quality at the same time. In this work, we envision a future where telepresence systems become a commodity and can be installed on typical de...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on visualization and computer graphics. - 1996. - PP(2024) vom: 14. Juni
1. Verfasser: Wang, Shengze (VerfasserIn)
Weitere Verfasser: Wang, Ziheng, Schmelzle, Ryan, Zheng, Liujie, Kwon, YoungJoong, Sengupta, Roni, Fuchs, Henry
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on visualization and computer graphics
Schlagworte:Journal Article
Beschreibung
Zusammenfassung:Recent telepresence systems have shown significant improvements in quality compared to prior systems. However, they struggle to achieve both low cost and high quality at the same time. In this work, we envision a future where telepresence systems become a commodity and can be installed on typical desktops. To this end, we present a high-quality view synthesis method that uses a cost-effective capture system that consists of commodity hardware accessible to the general public. We propose a neural renderer that uses a few RGBD cameras as input to synthesize novel views of a user and their surroundings. At the core of the renderer is Multi-Layer Point Cloud (MPC), a novel 3D representation that improves reconstruction accuracy by removing non-linear biases in depth cameras. Our temporally-aware renderer further improves the stability of synthesized videos by conditioning on past information. Additionally, we propose Spatial Skip Connections (SSC) to improve image upsampling under limited GPU memory. Experimental results show that our renderer outperforms recent methods in terms of view synthesis quality. Our method generalizes to new users and challenging content (e.g., hand gestures and clothing deformation) without costly per-video optimization, object templates, or heavy pre-processing. The code and dataset will be made available
Beschreibung:Date Revised 25.06.2024
published: Print-Electronic
Citation Status Publisher
ISSN:1941-0506
DOI:10.1109/TVCG.2024.3411626