Auto-Rectify Network for Unsupervised Indoor Depth Estimation

Single-View depth estimation using the CNNs trained from unlabelled videos has shown significant promise. However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices. In th...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 12 vom: 02. Dez., Seite 9802-9813
1. Verfasser: Bian, Jia-Wang (VerfasserIn)
Weitere Verfasser: Zhan, Huangying, Wang, Naiyan, Chin, Tat-Jun, Shen, Chunhua, Reid, Ian
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't
LEADER 01000naa a22002652 4500
001 NLM334556708
003 DE-627
005 20231225223946.0
007 cr uuu---uuuuu
008 231225s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2021.3136220  |2 doi 
028 5 2 |a pubmed24n1115.xml 
035 |a (DE-627)NLM334556708 
035 |a (NLM)34919516 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Bian, Jia-Wang  |e verfasserin  |4 aut 
245 1 0 |a Auto-Rectify Network for Unsupervised Indoor Depth Estimation 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 09.11.2022 
500 |a Date Revised 19.11.2022 
500 |a published: Print-Electronic 
500 |a Citation Status MEDLINE 
520 |a Single-View depth estimation using the CNNs trained from unlabelled videos has shown significant promise. However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices. In this work, we establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth. Our fundamental analysis suggests that the rotation behaves as noise during training, as opposed to the translation (baseline) which provides supervision signals. To address the challenge, we propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning. The significantly improved performance validates our motivation. Towards end-to-end learning without requiring pre-processing, we propose an Auto-Rectify Network with novel loss functions, which can automatically learn to rectify images during training. Consequently, our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset. We also demonstrate the generalization of our trained model in ScanNet and Make3D, and the universality of our proposed learning method on 7-Scenes and KITTI datasets 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
700 1 |a Zhan, Huangying  |e verfasserin  |4 aut 
700 1 |a Wang, Naiyan  |e verfasserin  |4 aut 
700 1 |a Chin, Tat-Jun  |e verfasserin  |4 aut 
700 1 |a Shen, Chunhua  |e verfasserin  |4 aut 
700 1 |a Reid, Ian  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 44(2022), 12 vom: 02. Dez., Seite 9802-9813  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:44  |g year:2022  |g number:12  |g day:02  |g month:12  |g pages:9802-9813 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2021.3136220  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 44  |j 2022  |e 12  |b 02  |c 12  |h 9802-9813