Long Short-Term Relation Transformer With Global Gating for Video Captioning

Video captioning aims to generate a natural language sentence to describe the main content of a video. Since there are multiple objects in videos, taking full exploration of the spatial and temporal relationships among them is crucial for this task. The previous methods wrap the detected objects as...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 24., Seite 2726-2738
Auteur principal: Li, Liang (Auteur)
Autres auteurs: Gao, Xingyu, Deng, Jincan, Tu, Yunbin, Zha, Zheng-Jun, Huang, Qingming
Format: Article en ligne
Langue:English
Publié: 2022
Accès à la collection:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:Journal Article