Plenty is Plague : Fine-Grained Learning for Visual Question Answering

Visual Question Answering (VQA) has attracted extensive research focus recently. Along with the ever-increasing data scale and model complexity, the enormous training cost has become an emerging challenge for VQA. In this article, we show such a massive training cost is indeed plague. In contrast, a...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 2 vom: 01. Feb., Seite 697-709
1. Verfasser: Zhou, Yiyi (VerfasserIn)
Weitere Verfasser: Ji, Rongrong, Sun, Xiaoshuai, Su, Jinsong, Meng, Deyu, Gao, Yue, Shen, Chunhua
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't
LEADER 01000naa a22002652 4500
001 NLM303982292
003 DE-627
005 20231225114008.0
007 cr uuu---uuuuu
008 231225s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2019.2956699  |2 doi 
028 5 2 |a pubmed24n1013.xml 
035 |a (DE-627)NLM303982292 
035 |a (NLM)31796387 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Zhou, Yiyi  |e verfasserin  |4 aut 
245 1 0 |a Plenty is Plague  |b Fine-Grained Learning for Visual Question Answering 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 28.03.2022 
500 |a Date Revised 31.05.2022 
500 |a published: Print-Electronic 
500 |a Citation Status MEDLINE 
520 |a Visual Question Answering (VQA) has attracted extensive research focus recently. Along with the ever-increasing data scale and model complexity, the enormous training cost has become an emerging challenge for VQA. In this article, we show such a massive training cost is indeed plague. In contrast, a fine-grained design of the learning paradigm can be extremely beneficial in terms of both training efficiency and model accuracy. In particular, we argue that there exist two essential and unexplored issues in the existing VQA training paradigm that randomly samples data in each epoch, namely, the "difficulty diversity" and the "label redundancy". Concretely, "difficulty diversity" refers to the varying difficulty levels of different question types, while "label redundancy" refers to the redundant and noisy labels contained in individual question type. To tackle these two issues, in this article we propose a fine-grained VQA learning paradigm with an actor-critic based learning agent, termed FG-A1C. Instead of using all training data from scratch, FG-A1C includes a learning agent that adaptively and intelligently schedules the most difficult question types in each training epoch. Subsequently, two curriculum learning based schemes are further designed to identify the most useful data to be learned within each inidividual question type. We conduct extensive experiments on the VQA2.0 and VQA-CP v2 datasets, which demonstrate the significant benefits of our approach. For instance, on VQA-CP v2, with less than 75 percent of the training data, our learning paradigms can help the model achieves better performance than using the whole dataset. Meanwhile, we also shows the effectivenesss of our method in guiding data labeling. Finally, the proposed paradigm can be seamlessly integrated with any cutting-edge VQA models, without modifying their structures 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
700 1 |a Ji, Rongrong  |e verfasserin  |4 aut 
700 1 |a Sun, Xiaoshuai  |e verfasserin  |4 aut 
700 1 |a Su, Jinsong  |e verfasserin  |4 aut 
700 1 |a Meng, Deyu  |e verfasserin  |4 aut 
700 1 |a Gao, Yue  |e verfasserin  |4 aut 
700 1 |a Shen, Chunhua  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 44(2022), 2 vom: 01. Feb., Seite 697-709  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:44  |g year:2022  |g number:2  |g day:01  |g month:02  |g pages:697-709 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2019.2956699  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 44  |j 2022  |e 2  |b 01  |c 02  |h 697-709