Divert More Attention to Vision-Language Object Tracking
Multimodal vision-language (VL) learning has noticeably pushed the tendency toward generic intelligence owing to emerging large foundation models. However, tracking, as a fundamental vision problem, surprisingly enjoys less bonus from recent flourishing VL learning. We argue that the reasons are two...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 12 vom: 04. Nov., Seite 8600-8618
|
1. Verfasser: |
Guo, Mingzhe
(VerfasserIn) |
Weitere Verfasser: |
Zhang, Zhipeng,
Jing, Liping,
Ling, Haibin,
Fan, Heng |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article |