Long Short-Term Relation Transformer With Global Gating for Video Captioning

Video captioning aims to generate a natural language sentence to describe the main content of a video. Since there are multiple objects in videos, taking full exploration of the spatial and temporal relationships among them is crucial for this task. The previous methods wrap the detected objects as...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 24., Seite 2726-2738
1. Verfasser: Li, Liang (VerfasserIn)
Weitere Verfasser: Gao, Xingyu, Deng, Jincan, Tu, Yunbin, Zha, Zheng-Jun, Huang, Qingming
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article