NVDS + : Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation

Video depth estimation aims to infer temporally consistent depth. One approach is to finetune a single-image model on each video with geometry constraints, which proves inefficient and lacks robustness. An alternative is learning to enforce consistency from data, which requires well-designed models...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2024) vom: 08. Okt.
1. Verfasser: Wang, Yiran (VerfasserIn)
Weitere Verfasser: Shi, Min, Li, Jiaqi, Hong, Chaoyi, Huang, Zihao, Peng, Juewen, Cao, Zhiguo, Zhang, Jianming, Xian, Ke, Lin, Guosheng
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM378648438
003 DE-627
005 20250306182142.0
007 cr uuu---uuuuu
008 241009s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2024.3476387  |2 doi 
028 5 2 |a pubmed25n1261.xml 
035 |a (DE-627)NLM378648438 
035 |a (NLM)39378259 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wang, Yiran  |e verfasserin  |4 aut 
245 1 0 |a NVDS +  |b Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 10.10.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Video depth estimation aims to infer temporally consistent depth. One approach is to finetune a single-image model on each video with geometry constraints, which proves inefficient and lacks robustness. An alternative is learning to enforce consistency from data, which requires well-designed models and sufficient video depth data. To address both challenges, we introduce NVDS + that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner. We also elaborate a large-scale Video Depth in the Wild (VDW) dataset, which contains 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset. Additionally, a bidirectional inference strategy is designed to improve consistency by adaptively fusing forward and backward predictions. We instantiate a model family ranging from small to large scales for different applications. The method is evaluated on VDW dataset and three public benchmarks. To further prove the versatility, we extend NVDS + to video semantic segmentation and several downstream applications like bokeh rendering, novel view synthesis, and 3D reconstruction. Experimental results show that our method achieves significant improvements in consistency, accuracy, and efficiency. Our work serves as a solid baseline and data foundation for learning-based video depth estimation. Code and dataset are available at: https://github.com/RaymondWang987/NVDS 
650 4 |a Journal Article 
700 1 |a Shi, Min  |e verfasserin  |4 aut 
700 1 |a Li, Jiaqi  |e verfasserin  |4 aut 
700 1 |a Hong, Chaoyi  |e verfasserin  |4 aut 
700 1 |a Huang, Zihao  |e verfasserin  |4 aut 
700 1 |a Peng, Juewen  |e verfasserin  |4 aut 
700 1 |a Cao, Zhiguo  |e verfasserin  |4 aut 
700 1 |a Zhang, Jianming  |e verfasserin  |4 aut 
700 1 |a Xian, Ke  |e verfasserin  |4 aut 
700 1 |a Lin, Guosheng  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g PP(2024) vom: 08. Okt.  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnas 
773 1 8 |g volume:PP  |g year:2024  |g day:08  |g month:10 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2024.3476387  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2024  |b 08  |c 10