Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer

Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 07., Seite 4701-4715
1. Verfasser: Lv, Xiaoqian (VerfasserIn)
Weitere Verfasser: Zhang, Shengping, Wang, Chenyang, Zhang, Weigang, Yao, Hongxun, Huang, Qingming
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM360487416
003 DE-627
005 20231226083122.0
007 cr uuu---uuuuu
008 231226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2023.3301332  |2 doi 
028 5 2 |a pubmed24n1201.xml 
035 |a (DE-627)NLM360487416 
035 |a (NLM)37549080 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Lv, Xiaoqian  |e verfasserin  |4 aut 
245 1 0 |a Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 16.08.2023 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion videos, which undermines their generalization to real-world scenes. Additionally, these methods typically suffer from temporal inconsistency (e.g., flickering artifacts and motion blurs) when handling large-scale motions since the local perception property of CNNs limits them to model long-range dependencies in both spatial and temporal domains. To address these problems, we propose the first unsupervised method for low-light video enhancement to our best knowledge, named LightenFormer, which models long-range intra- and inter-frame dependencies with a spatial-temporal co-attention transformer to enhance brightness while maintaining temporal consistency. Specifically, an effective but lightweight S-curve Estimation Network (SCENet) is first proposed to estimate pixel-wise S-shaped non-linear curves (S-curves) to adaptively adjust the dynamic range of an input video. Next, to model the temporal consistency of the video, we present a Spatial-Temporal Refinement Network (STRNet) to refine the enhanced video. The core module of STRNet is a novel Spatial-Temporal Co-attention Transformer (STCAT), which exploits multi-scale self- and cross-attention interactions to capture long-range correlations in both spatial and temporal domains among frames for implicit motion estimation. To achieve unsupervised training, we further propose two non-reference loss functions based on the invertibility of the S-curve and the noise independence among frames. Extensive experiments on the SDSD and LLIV-Phone datasets demonstrate that our LightenFormer outperforms state-of-the-art methods 
650 4 |a Journal Article 
700 1 |a Zhang, Shengping  |e verfasserin  |4 aut 
700 1 |a Wang, Chenyang  |e verfasserin  |4 aut 
700 1 |a Zhang, Weigang  |e verfasserin  |4 aut 
700 1 |a Yao, Hongxun  |e verfasserin  |4 aut 
700 1 |a Huang, Qingming  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 32(2023) vom: 07., Seite 4701-4715  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:32  |g year:2023  |g day:07  |g pages:4701-4715 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2023.3301332  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 32  |j 2023  |b 07  |h 4701-4715