Dual-View Alignment Learning With Hierarchical-Prompt for Class-Imbalance Multi-Label Image Classification

Real-world datasets often exhibit class imbalance across multiple categories, manifesting as long-tailed distributions and few-shot scenarios. This is especially challenging in Class-Imbalanced Multi-Label Image Classification (CI-MLIC) tasks, where data imbalance and multi-object recognition presen...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 34(2025) vom: 30., Seite 5989-6001
Auteur principal:	Huang, Sheng (Auteur)
Autres auteurs:	Yan, Jiexuan, Liu, Beiyan, Liu, Bo, Hong, Richang
Format:	Article en ligne
Langue:	English
Publié:	2025
Accès à la collection:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM392755807
003	DE-627
005	20250930234027.0
007	cr uuu---uuuuu
008	250920s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2025.3609185 \|2 doi
028	5	2	\|a pubmed25n1585.xml
035			\|a (DE-627)NLM392755807
035			\|a (NLM)40966154
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Huang, Sheng \|e verfasserin \|4 aut
245	1	0	\|a Dual-View Alignment Learning With Hierarchical-Prompt for Class-Imbalance Multi-Label Image Classification
264		1	\|c 2025
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 29.09.2025
500			\|a published: Print
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Real-world datasets often exhibit class imbalance across multiple categories, manifesting as long-tailed distributions and few-shot scenarios. This is especially challenging in Class-Imbalanced Multi-Label Image Classification (CI-MLIC) tasks, where data imbalance and multi-object recognition present significant obstacles. To address these challenges, we propose a novel method termed Dual-View Alignment Learning with Hierarchical Prompt (HP-DVAL), which leverages multi-modal knowledge from vision-language pretrained (VLP) models to mitigate the class-imbalance problem in multi-label settings. Specifically, HP-DVAL employs dual-view alignment learning to transfer the powerful feature representation capabilities from VLP models by extracting complementary features for accurate image-text alignment. To better adapt VLP models for CI-MLIC tasks, we introduce a hierarchical prompt-tuning strategy that utilizes global and local prompts to learn task-specific and context-related prior knowledge. Additionally, we design a semantic consistency loss during prompt tuning to prevent learned prompts from deviating from general knowledge embedded in VLP models. The effectiveness of our approach is validated on two CI-MLIC benchmarks: MS-COCO and VOC2007. Extensive experimental results demonstrate the superiority of our method over SOTA approaches, achieving mAP improvements of 10.0% and 5.2% on the long-tailed multi-label image classification task, and 6.8% and 2.9% on the multi-label few-shot image classification task
650		4	\|a Journal Article
700	1		\|a Yan, Jiexuan \|e verfasserin \|4 aut
700	1		\|a Liu, Beiyan \|e verfasserin \|4 aut
700	1		\|a Liu, Bo \|e verfasserin \|4 aut
700	1		\|a Hong, Richang \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 34(2025) vom: 30., Seite 5989-6001 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnas
773	1	8	\|g volume:34 \|g year:2025 \|g day:30 \|g pages:5989-6001
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2025.3609185 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 34 \|j 2025 \|b 30 \|h 5989-6001