|
|
|
|
LEADER |
01000caa a22002652 4500 |
001 |
NLM344954587 |
003 |
DE-627 |
005 |
20240214232549.0 |
007 |
cr uuu---uuuuu |
008 |
231226s2024 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TIP.2022.3197972
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1292.xml
|
035 |
|
|
|a (DE-627)NLM344954587
|
035 |
|
|
|a (NLM)35976823
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Shi, Zhangxiang
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching
|
264 |
|
1 |
|c 2024
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Revised 14.02.2024
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
520 |
|
|
|a The mainstream of image and sentence matching studies currently focuses on fine-grained alignment of image regions and sentence words. However, these methods miss a crucial fact: the correspondence between images and sentences does not simply come from alignments between individual regions and words but from alignments between the phrases they form respectively. In this work, we propose a novel Decoupled Cross-modal Phrase-Attention network (DCPA) for image-sentence matching by modeling the relationships between textual phrases and visual phrases. Furthermore, we design a novel decoupled manner for training and inferencing, which is able to release the trade-off for bi-directional retrieval, where image-to-sentence matching is executed in textual semantic space and sentence-to-image matching is executed in visual semantic space. Extensive experimental results on Flickr30K and MS-COCO demonstrate that the proposed method outperforms state-of-the-art methods by a large margin, and can compete with some methods introducing external knowledge
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Zhang, Tianzhu
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Wei, Xi
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Wu, Feng
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Zhang, Yongdong
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|d 1992
|g 33(2024) vom: 19., Seite 1326-1337
|w (DE-627)NLM09821456X
|x 1941-0042
|7 nnns
|
773 |
1 |
8 |
|g volume:33
|g year:2024
|g day:19
|g pages:1326-1337
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TIP.2022.3197972
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 33
|j 2024
|b 19
|h 1326-1337
|