Knowing What to Learn : A Metric-Oriented Focal Mechanism for Image Captioning

Despite considerable progress, image captioning still suffers from the huge difference in quality between easy and hard examples, which is left unexploited in existing methods. To address this issue, we explore the hard example mining in image captioning, and propose a simple yet effective mechanism...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 01., Seite 4321-4335
1. Verfasser: Ji, Jiayi (VerfasserIn)
Weitere Verfasser: Ma, Yiwei, Sun, Xiaoshuai, Zhou, Yiyi, Wu, Yongjian, Ji, Rongrong
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM342489852
003 DE-627
005 20231226014232.0
007 cr uuu---uuuuu
008 231226s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2022.3183434  |2 doi 
028 5 2 |a pubmed24n1141.xml 
035 |a (DE-627)NLM342489852 
035 |a (NLM)35727782 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Ji, Jiayi  |e verfasserin  |4 aut 
245 1 0 |a Knowing What to Learn  |b A Metric-Oriented Focal Mechanism for Image Captioning 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 01.07.2022 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Despite considerable progress, image captioning still suffers from the huge difference in quality between easy and hard examples, which is left unexploited in existing methods. To address this issue, we explore the hard example mining in image captioning, and propose a simple yet effective mechanism to instruct the model to pay more attention to hard examples, thereby improving the performance in both general and complex scenarios. We first propose a novel learning strategy, termed Metric-oriented Focal Mechanism (MFM), for hard example mining in image captioning. Differing from the existing strategies for classification tasks, MFM can adopt the generative metrics of image captioning to measure the difficulties of examples, and then up-weight the rewards of hard examples during training. To make MFM applicable to different datasets without tedious parameter tuning, we further introduce an adaptive reward metric called Effective CIDEr (ECIDEr), which considers the data distribution of easy and hard examples during reward estimation. Extensive experiments are conducted on the MS COCO benchmark, and the results show that while maintaining the performance on simple examples, MFM can significantly improve the quality of captions for hard examples. The ECIDEr-based MFM is equipped on the current SOTA method, e.g., DLCT (Luo et al., 2021), which outperforms all existing methods and achieves new state-of-the-art performance on both the off-line and on- line testing, i.e., 134.3 CIDEr for the off-line testing and 136.1 for the on- line testing of MSCOCO. To validate the generalization ability of ECIDEr-based MFM, we also apply it to another dataset, namely Flickr30k, and superior performance gains can also be obtained 
650 4 |a Journal Article 
700 1 |a Ma, Yiwei  |e verfasserin  |4 aut 
700 1 |a Sun, Xiaoshuai  |e verfasserin  |4 aut 
700 1 |a Zhou, Yiyi  |e verfasserin  |4 aut 
700 1 |a Wu, Yongjian  |e verfasserin  |4 aut 
700 1 |a Ji, Rongrong  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 31(2022) vom: 01., Seite 4321-4335  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:31  |g year:2022  |g day:01  |g pages:4321-4335 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2022.3183434  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 31  |j 2022  |b 01  |h 4321-4335