X 2-VLM : All-in-One Pre-Trained Model for Vision-Language Tasks

Vision language pre-training aims to learn alignments between vision and language from a large amount of data. Most existing methods only learn image-text alignments. Some others utilize pre-trained object detectors to leverage vision language alignments at the object level. In this paper, we propos...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 5 vom: 01. Apr., Seite 3156-3168
1. Verfasser: Zeng, Yan (VerfasserIn)
Weitere Verfasser: Zhang, Xinsong, Li, Hang, Wang, Jiawei, Zhang, Jipeng, Zhou, Wangchunshu
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article