Inverse Visual Question Answering : A New Benchmark and VQA Diagnosis Tool

In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps �...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 42(2020), 2 vom: 12. Feb., Seite 460-474
1. Verfasser: Liu, Feng (VerfasserIn)
Weitere Verfasser: Xiang, Tao, Hospedales, Timothy M, Yang, Wankou, Sun, Changyin
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2020
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't
LEADER 01000naa a22002652 4500
001 NLM290514975
003 DE-627
005 20231225064914.0
007 cr uuu---uuuuu
008 231225s2020 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2018.2880185  |2 doi 
028 5 2 |a pubmed24n0968.xml 
035 |a (DE-627)NLM290514975 
035 |a (NLM)30418897 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Liu, Feng  |e verfasserin  |4 aut 
245 1 0 |a Inverse Visual Question Answering  |b A New Benchmark and VQA Diagnosis Tool 
264 1 |c 2020 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 10.03.2020 
500 |a Date Revised 10.03.2020 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps 'understand' less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution [1]. In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models 'believe' about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
700 1 |a Xiang, Tao  |e verfasserin  |4 aut 
700 1 |a Hospedales, Timothy M  |e verfasserin  |4 aut 
700 1 |a Yang, Wankou  |e verfasserin  |4 aut 
700 1 |a Sun, Changyin  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 42(2020), 2 vom: 12. Feb., Seite 460-474  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:42  |g year:2020  |g number:2  |g day:12  |g month:02  |g pages:460-474 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2018.2880185  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 42  |j 2020  |e 2  |b 12  |c 02  |h 460-474