Learning Domain Invariant Prompt for Vision-Language Models

Prompt learning stands out as one of the most efficient approaches for adapting powerful vision-language foundational models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, despite its success in achieving remarkable performance on in-domain data,...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 09., Seite 1348-1360
1. Verfasser: Zhao, Cairong (VerfasserIn)
Weitere Verfasser: Wang, Yubin, Jiang, Xinyang, Shen, Yifei, Song, Kaitao, Li, Dongsheng, Miao, Duoqian
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM368245861
003 DE-627
005 20240215232049.0
007 cr uuu---uuuuu
008 240210s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2024.3362062  |2 doi 
028 5 2 |a pubmed24n1294.xml 
035 |a (DE-627)NLM368245861 
035 |a (NLM)38335087 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Zhao, Cairong  |e verfasserin  |4 aut 
245 1 0 |a Learning Domain Invariant Prompt for Vision-Language Models 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 15.02.2024 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Prompt learning stands out as one of the most efficient approaches for adapting powerful vision-language foundational models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, despite its success in achieving remarkable performance on in-domain data, prompt learning still faces the significant challenge of effectively generalizing to novel classes and domains. Some existing methods address this concern by dynamically generating distinct prompts for different domains. Yet, they overlook the inherent potential of prompts to generalize across unseen domains. To address these limitations, our study introduces an innovative prompt learning paradigm, called MetaPrompt, aiming to directly learn domain invariant prompt in few-shot scenarios. To facilitate learning prompts for image and text inputs independently, we present a dual-modality prompt tuning network comprising two pairs of coupled encoders. Our study centers on an alternate episodic training algorithm to enrich the generalization capacity of the learned prompts. In contrast to traditional episodic training algorithms, our approach incorporates both in-domain updates and domain-split updates in a batch-wise manner. For in-domain updates, we introduce a novel asymmetric contrastive learning paradigm, where representations from the pre-trained encoder assume supervision to regularize prompts from the prompted encoder. To enhance performance on out-of-domain distribution, we propose a domain-split optimization on visual prompts for cross-domain tasks or textual prompts for cross-class tasks during domain-split updates. Extensive experiments across 11 datasets for base-to-new generalization and 4 datasets for domain generalization exhibit favorable performance. Compared with the state-of-the-art method, MetaPrompt achieves an absolute gain of 1.02% on the overall harmonic mean in base-to-new generalization and consistently demonstrates superiority over all benchmarks in domain generalization 
650 4 |a Journal Article 
700 1 |a Wang, Yubin  |e verfasserin  |4 aut 
700 1 |a Jiang, Xinyang  |e verfasserin  |4 aut 
700 1 |a Shen, Yifei  |e verfasserin  |4 aut 
700 1 |a Song, Kaitao  |e verfasserin  |4 aut 
700 1 |a Li, Dongsheng  |e verfasserin  |4 aut 
700 1 |a Miao, Duoqian  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 33(2024) vom: 09., Seite 1348-1360  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:33  |g year:2024  |g day:09  |g pages:1348-1360 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2024.3362062  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 33  |j 2024  |b 09  |h 1348-1360