Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question...
| Publié dans: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 10 vom: 27. Okt., Seite 11624-11641 |
|---|---|
| Auteur principal: | |
| Autres auteurs: | , |
| Format: | Article en ligne |
| Langue: | English |
| Publié: |
2023
|
| Accès à la collection: | IEEE transactions on pattern analysis and machine intelligence |
| Sujets: | Journal Article |
| Accès en ligne |
Volltext |