G2-MonoDepth : A General Framework of Generalized Depth Inference From Monocular RGB+X Data
Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to add...
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 5 vom: 25. Mai, Seite 3753-3771 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
Schlagworte: | Journal Article |
Zusammenfassung: | Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This article investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data |
---|---|
Beschreibung: | Date Revised 05.04.2024 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
ISSN: | 1939-3539 |
DOI: | 10.1109/TPAMI.2023.3346466 |