Dual-View Alignment Learning With Hierarchical-Prompt for Class-Imbalance Multi-Label Image Classification

Real-world datasets often exhibit class imbalance across multiple categories, manifesting as long-tailed distributions and few-shot scenarios. This is especially challenging in Class-Imbalanced Multi-Label Image Classification (CI-MLIC) tasks, where data imbalance and multi-object recognition presen...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 34(2025) vom: 30., Seite 5989-6001
Auteur principal: Huang, Sheng (Auteur)
Autres auteurs: Yan, Jiexuan, Liu, Beiyan, Liu, Bo, Hong, Richang
Format: Article en ligne
Langue:English
Publié: 2025
Accès à la collection:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM392755807
003 DE-627
005 20250930234027.0
007 cr uuu---uuuuu
008 250920s2025 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2025.3609185  |2 doi 
028 5 2 |a pubmed25n1585.xml 
035 |a (DE-627)NLM392755807 
035 |a (NLM)40966154 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Huang, Sheng  |e verfasserin  |4 aut 
245 1 0 |a Dual-View Alignment Learning With Hierarchical-Prompt for Class-Imbalance Multi-Label Image Classification 
264 1 |c 2025 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 29.09.2025 
500 |a published: Print 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Real-world datasets often exhibit class imbalance across multiple categories, manifesting as long-tailed distributions and few-shot scenarios. This is especially challenging in Class-Imbalanced Multi-Label Image Classification (CI-MLIC) tasks, where data imbalance and multi-object recognition present significant obstacles. To address these challenges, we propose a novel method termed Dual-View Alignment Learning with Hierarchical Prompt (HP-DVAL), which leverages multi-modal knowledge from vision-language pretrained (VLP) models to mitigate the class-imbalance problem in multi-label settings. Specifically, HP-DVAL employs dual-view alignment learning to transfer the powerful feature representation capabilities from VLP models by extracting complementary features for accurate image-text alignment. To better adapt VLP models for CI-MLIC tasks, we introduce a hierarchical prompt-tuning strategy that utilizes global and local prompts to learn task-specific and context-related prior knowledge. Additionally, we design a semantic consistency loss during prompt tuning to prevent learned prompts from deviating from general knowledge embedded in VLP models. The effectiveness of our approach is validated on two CI-MLIC benchmarks: MS-COCO and VOC2007. Extensive experimental results demonstrate the superiority of our method over SOTA approaches, achieving mAP improvements of 10.0% and 5.2% on the long-tailed multi-label image classification task, and 6.8% and 2.9% on the multi-label few-shot image classification task 
650 4 |a Journal Article 
700 1 |a Yan, Jiexuan  |e verfasserin  |4 aut 
700 1 |a Liu, Beiyan  |e verfasserin  |4 aut 
700 1 |a Liu, Bo  |e verfasserin  |4 aut 
700 1 |a Hong, Richang  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 34(2025) vom: 30., Seite 5989-6001  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnas 
773 1 8 |g volume:34  |g year:2025  |g day:30  |g pages:5989-6001 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2025.3609185  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 34  |j 2025  |b 30  |h 5989-6001