MISC : Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, all existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. During recent years, th...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - PP(2024) vom: 27. Dez.
Auteur principal: Li, Chunyi (Auteur)
Autres auteurs: Lu, Guo, Feng, Donghui, Wu, Haoning, Zhang, Zicheng, Liu, Xiaohong, Zhai, Guangtao, Lin, Weisi, Zhang, Wenjun
Format: Article en ligne
Langue:English
Publié: 2024
Accès à la collection:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:Journal Article
Description
Résumé:With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, all existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. During recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on https://github.com/lcysyzxdxc/MISC
Description:Date Revised 03.03.2025
published: Print-Electronic
Citation Status Publisher
ISSN:1941-0042
DOI:10.1109/TIP.2024.3515874