MLDA-Net : Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation
The success of supervised learning-based single image depth estimation methods critically depends on the availability of large-scale dense per-pixel depth annotations, which requires both laborious and expensive annotation process. Therefore, the self-supervised methods are much desirable, which att...
Veröffentlicht in: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 02., Seite 4691-4705 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , , , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2021
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society |
Schlagworte: | Journal Article |
Zusammenfassung: | The success of supervised learning-based single image depth estimation methods critically depends on the availability of large-scale dense per-pixel depth annotations, which requires both laborious and expensive annotation process. Therefore, the self-supervised methods are much desirable, which attract significant attention recently. However, depth maps predicted by existing self-supervised methods tend to be blurry with many depth details lost. To overcome these limitations, we propose a novel framework, named MLDA-Net, to obtain per-pixel depth maps with shaper boundaries and richer depth details. Our first innovation is a multi-level feature extraction (MLFE) strategy which can learn rich hierarchical representation. Then, a dual-attention strategy, combining global attention and structure attention, is proposed to intensify the obtained features both globally and locally, resulting in improved depth maps with sharper boundaries. Finally, a reweighted loss strategy based on multi-level outputs is proposed to conduct effective supervision for self-supervised depth estimation. Experimental results demonstrate that our MLDA-Net framework achieves state-of-the-art depth prediction results on the KITTI benchmark for self-supervised monocular depth estimation with different input modes and training modes. Extensive experiments on other benchmark datasets further confirm the superiority of our proposed approach |
---|---|
Beschreibung: | Date Revised 04.05.2021 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
ISSN: | 1941-0042 |
DOI: | 10.1109/TIP.2021.3074306 |