|
|
|
|
LEADER |
01000naa a22002652 4500 |
001 |
NLM303982292 |
003 |
DE-627 |
005 |
20231225114008.0 |
007 |
cr uuu---uuuuu |
008 |
231225s2022 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TPAMI.2019.2956699
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1013.xml
|
035 |
|
|
|a (DE-627)NLM303982292
|
035 |
|
|
|a (NLM)31796387
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Zhou, Yiyi
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Plenty is Plague
|b Fine-Grained Learning for Visual Question Answering
|
264 |
|
1 |
|c 2022
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Completed 28.03.2022
|
500 |
|
|
|a Date Revised 31.05.2022
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status MEDLINE
|
520 |
|
|
|a Visual Question Answering (VQA) has attracted extensive research focus recently. Along with the ever-increasing data scale and model complexity, the enormous training cost has become an emerging challenge for VQA. In this article, we show such a massive training cost is indeed plague. In contrast, a fine-grained design of the learning paradigm can be extremely beneficial in terms of both training efficiency and model accuracy. In particular, we argue that there exist two essential and unexplored issues in the existing VQA training paradigm that randomly samples data in each epoch, namely, the "difficulty diversity" and the "label redundancy". Concretely, "difficulty diversity" refers to the varying difficulty levels of different question types, while "label redundancy" refers to the redundant and noisy labels contained in individual question type. To tackle these two issues, in this article we propose a fine-grained VQA learning paradigm with an actor-critic based learning agent, termed FG-A1C. Instead of using all training data from scratch, FG-A1C includes a learning agent that adaptively and intelligently schedules the most difficult question types in each training epoch. Subsequently, two curriculum learning based schemes are further designed to identify the most useful data to be learned within each inidividual question type. We conduct extensive experiments on the VQA2.0 and VQA-CP v2 datasets, which demonstrate the significant benefits of our approach. For instance, on VQA-CP v2, with less than 75 percent of the training data, our learning paradigms can help the model achieves better performance than using the whole dataset. Meanwhile, we also shows the effectivenesss of our method in guiding data labeling. Finally, the proposed paradigm can be seamlessly integrated with any cutting-edge VQA models, without modifying their structures
|
650 |
|
4 |
|a Journal Article
|
650 |
|
4 |
|a Research Support, Non-U.S. Gov't
|
700 |
1 |
|
|a Ji, Rongrong
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Sun, Xiaoshuai
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Su, Jinsong
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Meng, Deyu
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Gao, Yue
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Shen, Chunhua
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on pattern analysis and machine intelligence
|d 1979
|g 44(2022), 2 vom: 01. Feb., Seite 697-709
|w (DE-627)NLM098212257
|x 1939-3539
|7 nnns
|
773 |
1 |
8 |
|g volume:44
|g year:2022
|g number:2
|g day:01
|g month:02
|g pages:697-709
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TPAMI.2019.2956699
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 44
|j 2022
|e 2
|b 01
|c 02
|h 697-709
|