|
|
|
|
LEADER |
01000caa a22002652 4500 |
001 |
NLM366361961 |
003 |
DE-627 |
005 |
20240405233323.0 |
007 |
cr uuu---uuuuu |
008 |
231227s2024 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TPAMI.2023.3346434
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1366.xml
|
035 |
|
|
|a (DE-627)NLM366361961
|
035 |
|
|
|a (NLM)38145530
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Wen, Haokun
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval
|
264 |
|
1 |
|c 2024
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Revised 05.04.2024
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
520 |
|
|
|a The composed image retrieval (CIR) task aims to retrieve the desired target image for a given multimodal query, i.e., a reference image with its corresponding modification text. The key limitations encountered by existing efforts are two aspects: 1) ignoring the multiple query-target matching factors; 2) ignoring the potential unlabeled reference-target image pairs in existing benchmark datasets. To address these two limitations is non-trivial due to the following challenges: 1) how to effectively model the multiple matching factors in a latent way without direct supervision signals; 2) how to fully utilize the potential unlabeled reference-target image pairs to improve the generalization ability of the CIR model. To address these challenges, in this work, we first propose a CLIP-Transformer based muLtI-factor Matching Network (LIMN), which consists of three key modules: disentanglement-based latent factor tokens mining, dual aggregation-based matching token learning, and dual query-target matching modeling. Thereafter, we design an iterative dual self-training paradigm to further enhance the performance of LIMN by fully utilizing the potential unlabeled reference-target image pairs in a weakly-supervised manner. Specifically, we denote the iterative dual self-training paradigm enhanced LIMN as LIMN+. Extensive experiments on four datasets, including FashionIQ, Shoes, CIRR, and Fashion200 K, show that our proposed LIMN and LIMN+ significantly surpass the state-of-the-art baselines
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Song, Xuemeng
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Yin, Jianhua
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Wu, Jianlong
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Guan, Weili
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Nie, Liqiang
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on pattern analysis and machine intelligence
|d 1979
|g 46(2024), 5 vom: 04. Apr., Seite 3665-3678
|w (DE-627)NLM098212257
|x 1939-3539
|7 nnns
|
773 |
1 |
8 |
|g volume:46
|g year:2024
|g number:5
|g day:04
|g month:04
|g pages:3665-3678
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TPAMI.2023.3346434
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 46
|j 2024
|e 5
|b 04
|c 04
|h 3665-3678
|