CLSA : A Contrastive Learning Framework With Selective Aggregation for Video Rescaling

Video rescaling has recently drawn extensive attention for its practical applications such as video compression. Compared to video super-resolution, which focuses on upscaling bicubic-downscaled videos, video rescaling methods jointly optimize a downscaler and a upscaler. However, the inevitable los...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 04., Seite 1300-1314
Auteur principal: Tian, Yuan (Auteur)
Autres auteurs: Yan, Yichao, Zhai, Guangtao, Chen, Li, Gao, Zhiyong
Format: Article en ligne
Langue:English
Publié: 2023
Accès à la collection:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:Journal Article
Description
Résumé:Video rescaling has recently drawn extensive attention for its practical applications such as video compression. Compared to video super-resolution, which focuses on upscaling bicubic-downscaled videos, video rescaling methods jointly optimize a downscaler and a upscaler. However, the inevitable loss of information during downscaling makes the upscaling procedure still ill-posed. Furthermore, the network architecture of previous methods mostly relies on convolution to aggregate information within local regions, which cannot effectively capture the relationship between distant locations. To address the above two issues, we propose a unified video rescaling framework by introducing the following designs. First, we propose to regularize the information of the downscaled videos via a contrastive learning framework, where, particularly, hard negative samples for learning are synthesized online. With this auxiliary contrastive learning objective, the downscaler tends to retain more information that benefits the upscaler. Second, we present a selective global aggregation module (SGAM) to efficiently capture long-range redundancy in high-resolution videos, where only a few representative locations are adaptively selected to participate in the computationally-heavy self-attention (SA) operations. SGAM enjoys the efficiency of the sparse modeling scheme while preserving the global modeling capability of SA. We refer to the proposed framework as Contrastive Learning framework with Selective Aggregation (CLSA) for video rescaling. Comprehensive experimental results show that CLSA outperforms video rescaling and rescaling-based video compression methods on five datasets, achieving state-of-the-art performance
Description:Date Revised 04.04.2025
published: Print-Electronic
Citation Status PubMed-not-MEDLINE
ISSN:1941-0042
DOI:10.1109/TIP.2023.3242774