Dynamic Spatio-Temporal Graph Reasoning for VideoQA With Self-Supervised Event Recognition
Video question answering (VideoQA) requires the ability of comprehensively understanding visual contents in videos. Existing VideoQA models mainly focus on scenarios involving a single event with simple object interactions and leave event-centric scenarios involving multiple events with dynamically...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 10., Seite 4145-4158
|
1. Verfasser: |
Nie, Jie
(VerfasserIn) |
Weitere Verfasser: |
Wang, Xin,
Hou, Runze,
Li, Guohao,
Chen, Hong,
Zhu, Wenwu |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|
Schlagworte: | Journal Article |