|
|
|
|
| LEADER |
01000caa a22002652c 4500 |
| 001 |
NLM357914201 |
| 003 |
DE-627 |
| 005 |
20250304212053.0 |
| 007 |
cr uuu---uuuuu |
| 008 |
231226s2023 xx |||||o 00| ||eng c |
| 024 |
7 |
|
|a 10.1109/TPAMI.2023.3284038
|2 doi
|
| 028 |
5 |
2 |
|a pubmed25n1192.xml
|
| 035 |
|
|
|a (DE-627)NLM357914201
|
| 035 |
|
|
|a (NLM)37289602
|
| 040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
| 041 |
|
|
|a eng
|
| 100 |
1 |
|
|a Liu, Yang
|e verfasserin
|4 aut
|
| 245 |
1 |
0 |
|a Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
|
| 264 |
|
1 |
|c 2023
|
| 336 |
|
|
|a Text
|b txt
|2 rdacontent
|
| 337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
| 338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
| 500 |
|
|
|a Date Revised 06.09.2023
|
| 500 |
|
|
|a published: Print-Electronic
|
| 500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
| 520 |
|
|
|a Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question answering, we propose a framework for cross-modal causal relational reasoning. In particular, a set of causal intervention operations is introduced to discover the underlying causal structures across visual and linguistic modalities. Our framework, named Cross-Modal Causal RelatIonal Reasoning (CMCIR), involves three modules: i) Causality-aware Visual-Linguistic Reasoning (CVLR) module for collaboratively disentangling the visual and linguistic spurious correlations via front-door and back-door causal interventions; ii) Spatial-Temporal Transformer (STT) module for capturing the fine-grained interactions between visual and linguistic semantics; iii) Visual-Linguistic Feature Fusion (VLFF) module for learning the global semantic-aware visual-linguistic representations adaptively. Extensive experiments on four event-level datasets demonstrate the superiority of our CMCIR in discovering visual-linguistic causal structures and achieving robust event-level visual question answering
|
| 650 |
|
4 |
|a Journal Article
|
| 700 |
1 |
|
|a Li, Guanbin
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Lin, Liang
|e verfasserin
|4 aut
|
| 773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on pattern analysis and machine intelligence
|d 1979
|g 45(2023), 10 vom: 27. Okt., Seite 11624-11641
|w (DE-627)NLM098212257
|x 1939-3539
|7 nnas
|
| 773 |
1 |
8 |
|g volume:45
|g year:2023
|g number:10
|g day:27
|g month:10
|g pages:11624-11641
|
| 856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TPAMI.2023.3284038
|3 Volltext
|
| 912 |
|
|
|a GBV_USEFLAG_A
|
| 912 |
|
|
|a SYSFLAG_A
|
| 912 |
|
|
|a GBV_NLM
|
| 912 |
|
|
|a GBV_ILN_350
|
| 951 |
|
|
|a AR
|
| 952 |
|
|
|d 45
|j 2023
|e 10
|b 27
|c 10
|h 11624-11641
|