X 2-VLM : All-in-One Pre-Trained Model for Vision-Language Tasks
Vision language pre-training aims to learn alignments between vision and language from a large amount of data. Most existing methods only learn image-text alignments. Some others utilize pre-trained object detectors to leverage vision language alignments at the object level. In this paper, we propos...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 5 vom: 01. Apr., Seite 3156-3168
|
1. Verfasser: |
Zeng, Yan
(VerfasserIn) |
Weitere Verfasser: |
Zhang, Xinsong,
Li, Hang,
Wang, Jiawei,
Zhang, Jipeng,
Zhou, Wangchunshu |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article |