Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering

Due to the rich spatio-temporal visual content and complex multimodal relations, Video Question Answering (VideoQA) has become a challenging task and attracted increasing attention. Current methods usually leverage visual attention, linguistic attention, or self-attention to uncover latent correlati...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 19., Seite 1684-1696
1. Verfasser:	Liu, Yun (VerfasserIn)
Weitere Verfasser:	Zhang, Xiaoming, Huang, Feiran, Zhang, Bo, Li, Zhoujun
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2022
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article

Online verfügbar	Volltext