Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering

Video question answering (VideoQA) is challenging since it requires the model to extract and combine multi-level visual concepts from local objects to global actions from complex events for compositional reasoning. Existing works represent the video with fixed-duration clip features that make the mo...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 02., Seite 1109-1121
1. Verfasser:	Bai, Ziyi (VerfasserIn)
Weitere Verfasser:	Wang, Ruiping, Gao, Difei, Chen, Xilin
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article

Online verfügbar	Volltext