Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering
Video question answering (VideoQA) is challenging since it requires the model to extract and combine multi-level visual concepts from local objects to global actions from complex events for compositional reasoning. Existing works represent the video with fixed-duration clip features that make the mo...
Veröffentlicht in: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 02., Seite 1109-1121 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society |
Schlagworte: | Journal Article |
Online verfügbar |
Volltext |