Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering
Due to the rich spatio-temporal visual content and complex multimodal relations, Video Question Answering (VideoQA) has become a challenging task and attracted increasing attention. Current methods usually leverage visual attention, linguistic attention, or self-attention to uncover latent correlati...
Publié dans: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 19., Seite 1684-1696 |
---|---|
Auteur principal: | |
Autres auteurs: | , , , |
Format: | Article en ligne |
Langue: | English |
Publié: |
2022
|
Accès à la collection: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society |
Sujets: | Journal Article |
Accès en ligne |
Volltext |