Unpaired Image-text Matching via Multimodal Aligned Conceptual Knowledge

Recently, the accuracy of image-text matching has been greatly improved by multimodal pretrained models, all of which use millions or billions of paired images and texts for supervised model learning. Different from them, human brains can well match images with texts using their stored multimodal kn...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2024) vom: 23. Juli
1. Verfasser: Huang, Yan (VerfasserIn)
Weitere Verfasser: Wang, Yuming, Zeng, Yunan, Huang, Junshi, Chai, Zhenhua, Wang, Liang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article