Integrating Both Parallax and Latency Compensation into Video See-through Head-mounted Display

This work introduces a perspective-corrected video see-through mixed-reality head-mounted display with edge-preserving occlusion and low-latency capabilities. To realize the consistent spatial and temporal composition of a captured real world containing virtual objects, we perform three essential ta...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on visualization and computer graphics. - 1996. - 29(2023), 5 vom: 04. Mai, Seite 2826-2836
Auteur principal:	Ishihara, Atsushi (Auteur)
Autres auteurs:	Aga, Hiroyuki, Ishihara, Yasuko, Ichikawa, Hirotake, Kaji, Hidetaka, Kawasaki, Koichi, Kobayashi, Daita, Kobayashi, Toshimi, Nishida, Ken, Hamasaki, Takumi, Mori, Hideto, Morikubo, Yuki
Format:	Article en ligne
Langue:	English
Publié:	2023
Accès à la collection:	IEEE transactions on visualization and computer graphics
Sujets:	Journal Article

Description
Résumé:	This work introduces a perspective-corrected video see-through mixed-reality head-mounted display with edge-preserving occlusion and low-latency capabilities. To realize the consistent spatial and temporal composition of a captured real world containing virtual objects, we perform three essential tasks: 1) to reconstruct captured images so as to match the user's view; 2) to occlude virtual objects with nearer real objects, to provide users with correct depth cues; and 3) to reproject the virtual and captured scenes to be matched and to keep up with users' head motions. Captured image reconstruction and occlusion-mask generation require dense and accurate depth maps. However, estimating these maps is computationally difficult, which results in longer latencies. To obtain an acceptable balance between spatial consistency and low latency, we rapidly generated depth maps by focusing on edge smoothness and disocclusion (instead of fully accurate maps), to shorten the processing time. Our algorithm refines edges via a hybrid method involving infrared masks and color-guided filters, and it fills disocclusions using temporally cached depth maps. Our system combines these algorithms in a two-phase temporal warping architecture based upon synchronized camera pairs and displays. The first phase of warping is to reduce registration errors between the virtual and captured scenes. The second is to present virtual and captured scenes that correspond with the user's head motion. We implemented these methods on our wearable prototype and performed end-to-end measurements of its accuracy and latency. We achieved an acceptable latency due to head motion (less than 4 ms) and spatial accuracy (less than 0.1° in size and less than 0.3° in position) in our test environment. We anticipate that this work will help improve the realism of mixed reality systems
Description:	Date Revised 04.04.2025 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1941-0506
DOI:	10.1109/TVCG.2023.3247460