Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data
Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 7 vom: 01. Juni, Seite 4747-4762
|
1. Verfasser: |
Wu, Zuxuan
(VerfasserIn) |
Weitere Verfasser: |
Weng, Zejia,
Peng, Wujian,
Yang, Xitong,
Li, Ang,
Davis, Larry S,
Jiang, Yu-Gang |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article |