Learning Optical Flow and Scene Flow With Bidirectional Camera-LiDAR Fusion

In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline that splits the joint task into independent stages, or fuse 2D and 3D information in an "early-fusion" or "late...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 4 vom: 07. März, Seite 2378-2395
1. Verfasser: Liu, Haisong (VerfasserIn)
Weitere Verfasser: Lu, Tao, Xu, Yihui, Liu, Jia, Wang, Limin
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM364264365
003 DE-627
005 20240308232134.0
007 cr uuu---uuuuu
008 231226s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2023.3330866  |2 doi 
028 5 2 |a pubmed24n1320.xml 
035 |a (DE-627)NLM364264365 
035 |a (NLM)37934646 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Liu, Haisong  |e verfasserin  |4 aut 
245 1 0 |a Learning Optical Flow and Scene Flow With Bidirectional Camera-LiDAR Fusion 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 08.03.2024 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline that splits the joint task into independent stages, or fuse 2D and 3D information in an "early-fusion" or "late-fusion" manner. Such one-size-fits-all approaches suffer from a dilemma of failing to fully utilize the characteristic of each modality or to maximize the inter-modality complementarity. To address the problem, we propose a novel end-to-end framework, which consists of 2D and 3D branches with multiple bidirectional fusion connections between them in specific layers. Different from previous work, we apply a point-based 3D branch to extract the LiDAR features, as it preserves the geometric structure of point clouds. To fuse dense image features and sparse point features, we propose a learnable operator named bidirectional camera-LiDAR fusion module (Bi-CLFM). We instantiate two types of the bidirectional fusion pipeline, one based on the pyramidal coarse-to-fine architecture (dubbed CamLiPWC), and the other one based on the recurrent all-pairs field transforms (dubbed CamLiRAFT). On FlyingThings3D, both CamLiPWC and CamLiRAFT surpass all existing methods and achieve up to a 47.9% reduction in 3D end-point-error from the best published result. Our best-performing model, CamLiRAFT, achieves an error of 4.26% on the KITTI Scene Flow benchmark, ranking 1st among all submissions with much fewer parameters. Besides, our methods have strong generalization performance and the ability to handle non-rigid motion 
650 4 |a Journal Article 
700 1 |a Lu, Tao  |e verfasserin  |4 aut 
700 1 |a Xu, Yihui  |e verfasserin  |4 aut 
700 1 |a Liu, Jia  |e verfasserin  |4 aut 
700 1 |a Wang, Limin  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 46(2024), 4 vom: 07. März, Seite 2378-2395  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:46  |g year:2024  |g number:4  |g day:07  |g month:03  |g pages:2378-2395 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2023.3330866  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 46  |j 2024  |e 4  |b 07  |c 03  |h 2378-2395