Study of Spatio-Temporal Modeling in Video Quality Assessment

Video quality assessment (VQA) has received remarkable attention recently. Most of the popular VQA models employ recurrent neural networks (RNNs) to capture the temporal quality variation of videos. However, each long-term video sequence is commonly labeled with a single quality score, with which RN...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 20., Seite 2693-2702
1. Verfasser: Fang, Yuming (VerfasserIn)
Weitere Verfasser: Li, Zhaoqian, Yan, Jiebin, Sui, Xiangjie, Liu, Hantao
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM356491269
003 DE-627
005 20250304175922.0
007 cr uuu---uuuuu
008 231226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2023.3272480  |2 doi 
028 5 2 |a pubmed25n1187.xml 
035 |a (DE-627)NLM356491269 
035 |a (NLM)37145945 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Fang, Yuming  |e verfasserin  |4 aut 
245 1 0 |a Study of Spatio-Temporal Modeling in Video Quality Assessment 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 17.05.2023 
500 |a Date Revised 17.05.2023 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Video quality assessment (VQA) has received remarkable attention recently. Most of the popular VQA models employ recurrent neural networks (RNNs) to capture the temporal quality variation of videos. However, each long-term video sequence is commonly labeled with a single quality score, with which RNNs might not be able to learn long-term quality variation well: What's the real role of RNNs in learning the visual quality of videos? Does it learn spatio-temporal representation as expected or just aggregating spatial features redundantly? In this study, we conduct a comprehensive study by training a family of VQA models with carefully designed frame sampling strategies and spatio-temporal fusion methods. Our extensive experiments on four publicly available in- the-wild video quality datasets lead to two main findings. First, the plausible spatio-temporal modeling module (i. e., RNNs) does not facilitate quality-aware spatio-temporal feature learning. Second, sparsely sampled video frames are capable of obtaining the competitive performance against using all video frames as the input. In other words, spatial features play a vital role in capturing video quality variation for VQA. To our best knowledge, this is the first work to explore the issue of spatio-temporal modeling in VQA 
650 4 |a Journal Article 
700 1 |a Li, Zhaoqian  |e verfasserin  |4 aut 
700 1 |a Yan, Jiebin  |e verfasserin  |4 aut 
700 1 |a Sui, Xiangjie  |e verfasserin  |4 aut 
700 1 |a Liu, Hantao  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 32(2023) vom: 20., Seite 2693-2702  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnas 
773 1 8 |g volume:32  |g year:2023  |g day:20  |g pages:2693-2702 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2023.3272480  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 32  |j 2023  |b 20  |h 2693-2702