Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression

Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - PP(2022) vom: 27. Dez.
1. Verfasser: Paul, Somdyuti (VerfasserIn)
Weitere Verfasser: Norkin, Andrey, Bovik, Alan C
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM355202468
003 DE-627
005 20231226063840.0
007 cr uuu---uuuuu
008 231226s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2022.3231082  |2 doi 
028 5 2 |a pubmed24n1183.xml 
035 |a (DE-627)NLM355202468 
035 |a (NLM)37015500 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Paul, Somdyuti  |e verfasserin  |4 aut 
245 1 0 |a Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 04.04.2023 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which tends to worsen as the block size is reduced. Moreover, the block matching criteria used in typical codecs do not account for the resulting levels of perceptual quality of the motion compensated pictures that are created upon decoding. Towards achieving the elusive goal of perceptually optimized motion estimation, we propose a search-free block motion estimation framework using a multi-stage convolutional neural network, which is able to conduct motion estimation on multiple block sizes simultaneously, using a triplet of frames as input. This composite block translation network (CBT-Net) is trained in a self-supervised manner on a large database that we created from publicly available uncompressed video content. We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames. Our experimental results highlight the computational efficiency of our proposed model relative to conventional block matching based motion estimation algorithms, for comparable prediction errors. Further, when used to perform inter prediction in AV1, the MV predictions of the perceptually optimized model result in average Bjontegaard-delta rate (BD-rate) improvements of -1.73% and -1.31% with respect to the MS-SSIM and Video Multi-Method Assessment Fusion (VMAF) quality metrics, respectively, as compared to the block matching based motion estimation system employed in the SVT-AV1 encoder 
650 4 |a Journal Article 
700 1 |a Norkin, Andrey  |e verfasserin  |4 aut 
700 1 |a Bovik, Alan C  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g PP(2022) vom: 27. Dez.  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:PP  |g year:2022  |g day:27  |g month:12 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2022.3231082  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2022  |b 27  |c 12