Unifying Flow, Stereo and Depth Estimation

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense co...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 11 vom: 25. Nov., Seite 13941-13958
1. Verfasser: Xu, Haofei (VerfasserIn)
Weitere Verfasser: Zhang, Jing, Cai, Jianfei, Rezatofighi, Hamid, Yu, Fisher, Tao, Dacheng, Geiger, Andreas
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM359905854
003 DE-627
005 20231226081855.0
007 cr uuu---uuuuu
008 231226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2023.3298645  |2 doi 
028 5 2 |a pubmed24n1199.xml 
035 |a (DE-627)NLM359905854 
035 |a (NLM)37490383 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Xu, Haofei  |e verfasserin  |4 aut 
245 1 0 |a Unifying Flow, Stereo and Depth Estimation 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 04.10.2023 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed 
650 4 |a Journal Article 
700 1 |a Zhang, Jing  |e verfasserin  |4 aut 
700 1 |a Cai, Jianfei  |e verfasserin  |4 aut 
700 1 |a Rezatofighi, Hamid  |e verfasserin  |4 aut 
700 1 |a Yu, Fisher  |e verfasserin  |4 aut 
700 1 |a Tao, Dacheng  |e verfasserin  |4 aut 
700 1 |a Geiger, Andreas  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 45(2023), 11 vom: 25. Nov., Seite 13941-13958  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:45  |g year:2023  |g number:11  |g day:25  |g month:11  |g pages:13941-13958 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2023.3298645  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 45  |j 2023  |e 11  |b 25  |c 11  |h 13941-13958