Efficient Token-Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training

Image-text retrieval is a central problem for understanding the semantic relationship between vision and language, and serves as the basis for various visual and language tasks. Most previous works either simply learn coarse-grained representations of the overall image and text, or elaborately estab...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 20., Seite 3622-3633
1. Verfasser: Liu, Chong (VerfasserIn)
Weitere Verfasser: Zhang, Yuqi, Wang, Hongsong, Chen, Weihua, Wang, Fan, Huang, Yan, Shen, Yi-Dong, Wang, Liang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article