Event-Based Optical Flow via Transforming Into Motion-Dependent View

Event cameras respond to temporal dynamics, helping to resolve ambiguities in spatio-temporal changes for optical flow estimation. However, the unique spatio-temporal event distribution challenges the feature extraction, and the direct construction of motion representation through the orthogonal vie...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 27., Seite 5327-5339
1. Verfasser: Wan, Zengyu (VerfasserIn)
Weitere Verfasser: Tan, Ganchao, Wang, Yang, Zhai, Wei, Cao, Yang, Zha, Zheng-Jun
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
Beschreibung
Zusammenfassung:Event cameras respond to temporal dynamics, helping to resolve ambiguities in spatio-temporal changes for optical flow estimation. However, the unique spatio-temporal event distribution challenges the feature extraction, and the direct construction of motion representation through the orthogonal view is less than ideal due to the entanglement of appearance and motion. This paper proposes to transform the orthogonal view into a motion-dependent one for enhancing event-based motion representation and presents a Motion View-based Network (MV-Net) for practical optical flow estimation. Specifically, this motion-dependent view transformation is achieved through the Event View Transformation Module, which captures the relationship between the steepest temporal changes and motion direction, incorporating these temporal cues into the view transformation process for feature gathering. This module includes two phases: extracting the temporal evolution clues by central difference operation in the extraction phase and capturing the motion pattern by evolution-guided deformable convolution in the perception phase. Besides, the MV-Net constructs an eccentric downsampling process to avoid response weakening from the sparsity of events in the downsampling stage. The whole network is trained end-to-end in a self-supervised manner, and the evaluations conducted on four challenging datasets reveal the superior performance of the proposed model compared to state-of-the-art (SOTA) methods
Beschreibung:Date Revised 27.09.2024
published: Print-Electronic
Citation Status PubMed-not-MEDLINE
ISSN:1941-0042
DOI:10.1109/TIP.2024.3426469