Transformer-Empowered Invariant Grounding for Video Question Answering

Video Question Answering (VideoQA) is the task of answering questions about a video. At its core is the understanding of the alignments between video scenes and question semantics to yield the answer. In leading VideoQA models, the typical learning objective, empirical risk minimization (ERM), tends...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2023) vom: 09. Aug.
1. Verfasser: Li, Yicong (VerfasserIn)
Weitere Verfasser: Wang, Xiang, Xiao, Junbin, Ji, Wei, Chua, Tat-Seng
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article