Compositional Attention Networks with Two-Stream Fusion for Video Question Answering
Given a video, Video Question Answering (VideoQA) aims at answering arbitrary free-form questions about the video content in natural language. A successful VideoQA framework usually has the following two key components: 1) a discriminative video encoder that learns the effective video representation...
Veröffentlicht in: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - (2019) vom: 16. Sept. |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2019
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society |
Schlagworte: | Journal Article |
Online verfügbar |
Volltext |