Self-Supervised Learning of Event-Guided Video Frame Interpolation for Rolling Shutter Frames

Most consumer cameras use rolling shutter (RS) exposure, the captured videos often suffer from distortions (e.g., skew and jelly effect). Also, these videos are impeded by the limited bandwidth and frame rate, which inevitably affect the video streaming experience. In this paper, we excavate the pot...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on visualization and computer graphics. - 1996. - 31(2025), 10 vom: 06. Sept., Seite 8683-8695
1. Verfasser: Lu, Yunfan (VerfasserIn)
Weitere Verfasser: Liang, Guoqiang, Shen, Yiran, Wang, Lin
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2025
Zugriff auf das übergeordnete Werk:IEEE transactions on visualization and computer graphics
Schlagworte:Journal Article
LEADER 01000naa a22002652c 4500
001 NLM392016966
003 DE-627
005 20250906233550.0
007 cr uuu---uuuuu
008 250906s2025 xx |||||o 00| ||eng c
024 7 |a 10.1109/TVCG.2025.3576305  |2 doi 
028 5 2 |a pubmed25n1558.xml 
035 |a (DE-627)NLM392016966 
035 |a (NLM)40460006 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Lu, Yunfan  |e verfasserin  |4 aut 
245 1 0 |a Self-Supervised Learning of Event-Guided Video Frame Interpolation for Rolling Shutter Frames 
264 1 |c 2025 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 05.09.2025 
500 |a published: Print 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Most consumer cameras use rolling shutter (RS) exposure, the captured videos often suffer from distortions (e.g., skew and jelly effect). Also, these videos are impeded by the limited bandwidth and frame rate, which inevitably affect the video streaming experience. In this paper, we excavate the potential of event cameras as they enjoy high temporal resolution. Accordingly, we propose a framework to recover the global shutter (GS) high frame rate (i.e., slow motion) video without RS distortion from an RS camera and event camera. One challenge is the lack of real-world datasets for supervised training. Therefore, we explore self-supervised learning with the key idea of estimating the displacement field-a non-linear and dense 3D spatiotemporal representation of all pixels during the exposure time. This allows for a mutual reconstruction between RS and GS frames and facilitates slow-motion video recovery. We then combine the input RS frames with the DF to map them to the GS frames (RS-to-GS). Given the under-constrained nature of this mapping, we integrate it with the inverse mapping (GS-to-RS) and RS frame warping (RS-to-RS) for self-supervision. We evaluate our framework via objective analysis (i.e., quantitative and qualitative comparisons on four datasets) and subjective studies (i.e., user study). The results show that our framework can recover slow-motion videos without distortion, with much lower bandwidth (94% drop) and higher inference speed ($ 16\; {\rm ms}/{\rm frame}$16 ms / frame ) under $32 \times$32× frame interpolation 
650 4 |a Journal Article 
700 1 |a Liang, Guoqiang  |e verfasserin  |4 aut 
700 1 |a Shen, Yiran  |e verfasserin  |4 aut 
700 1 |a Wang, Lin  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on visualization and computer graphics  |d 1996  |g 31(2025), 10 vom: 06. Sept., Seite 8683-8695  |w (DE-627)NLM098269445  |x 1941-0506  |7 nnas 
773 1 8 |g volume:31  |g year:2025  |g number:10  |g day:06  |g month:09  |g pages:8683-8695 
856 4 0 |u http://dx.doi.org/10.1109/TVCG.2025.3576305  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 31  |j 2025  |e 10  |b 06  |c 09  |h 8683-8695