Structured Multimodal Attentions for TextVQA
Text based Visual Question Answering (TextVQA) is a recently raised challenge requiring models to read text in images and answer natural language questions by jointly reasoning over the question, textual information and visual content. Introduction of this new modality - Optical Character Recognitio...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 12 vom: 02. Dez., Seite 9603-9614
|
1. Verfasser: |
Gao, Chenyu
(VerfasserIn) |
Weitere Verfasser: |
Zhu, Qi,
Wang, Peng,
Li, Hui,
Liu, Yuliang,
Hengel, Anton van den,
Wu, Qi |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2022
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article
Research Support, Non-U.S. Gov't |