Long Short-Term Relation Transformer With Global Gating for Video Captioning
Video captioning aims to generate a natural language sentence to describe the main content of a video. Since there are multiple objects in videos, taking full exploration of the spatial and temporal relationships among them is crucial for this task. The previous methods wrap the detected objects as...
Description complète
Détails bibliographiques
Publié dans: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 24., Seite 2726-2738
|
Auteur principal: |
Li, Liang
(Auteur) |
Autres auteurs: |
Gao, Xingyu,
Deng, Jincan,
Tu, Yunbin,
Zha, Zheng-Jun,
Huang, Qingming |
Format: | Article en ligne
|
Langue: | English |
Publié: |
2022
|
Accès à la collection: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|
Sujets: | Journal Article |