Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering

Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 10 vom: 27. Okt., Seite 11624-11641
Auteur principal: Liu, Yang (Auteur)
Autres auteurs: Li, Guanbin, Lin, Liang
Format: Article en ligne
Langue:English
Publié: 2023
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article