Image Captioning with End-to-end Attribute Detection and Subsequent Attributes Prediction

Semantic attention has been shown to be effective in improving the performance of image captioning. The core of semantic attention based methods is to drive the model to attend to semantically important words, or attributes. In previous works, the attribute detector and the captioning network are us...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - (2020) vom: 30. Jan.
1. Verfasser: Huang, Yiqing (VerfasserIn)
Weitere Verfasser: Chen, Jiansheng, Ouyang, Wanli, Wan, Weitao, Xue, Youze
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2020
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM306076128
003 DE-627
005 20240229162520.0
007 cr uuu---uuuuu
008 231225s2020 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2020.2969330  |2 doi 
028 5 2 |a pubmed24n1308.xml 
035 |a (DE-627)NLM306076128 
035 |a (NLM)32012014 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Huang, Yiqing  |e verfasserin  |4 aut 
245 1 0 |a Image Captioning with End-to-end Attribute Detection and Subsequent Attributes Prediction 
264 1 |c 2020 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 27.02.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Semantic attention has been shown to be effective in improving the performance of image captioning. The core of semantic attention based methods is to drive the model to attend to semantically important words, or attributes. In previous works, the attribute detector and the captioning network are usually independent, leading to the insufficient usage of the semantic information. Also, all the detected attributes, no matter whether they are appropriate for the linguistic context at the current step, are attended to through the whole caption generation process. This may sometimes disrupt the captioning model to attend to incorrect visual concepts. To solve these problems, we introduce two end-to-end trainable modules to closely couple attribute detection with image captioning as well as prompt the effective uses of attributes by predicting appropriate attributes at each time step. The multimodal attribute detector (MAD) module improves the attribute detection accuracy by using not only the image features but also the word embedding of attributes already existing in most captioning models. MAD models the similarity between the semantics of attributes and the image object features to facilitate accurate detection. The subsequent attribute predictor (SAP) module dynamically predicts a concise attribute subset at each time step to mitigate the diversity of image attributes. Compared to previous attribute based methods, our approach enhances the explainability in how the attributes affect the generated words and achieves a state-of-the-art single model performance of 128.8 CIDEr-D on the MSCOCO dataset. Extensive experiments on the MSCOCO dataset show that our proposal actually improves the performances in both image captioning and attribute detection simultaneously. The codes are available at: https://github.com/ RubickH/Image-Captioning-with-MAD-and-SAP 
650 4 |a Journal Article 
700 1 |a Chen, Jiansheng  |e verfasserin  |4 aut 
700 1 |a Ouyang, Wanli  |e verfasserin  |4 aut 
700 1 |a Wan, Weitao  |e verfasserin  |4 aut 
700 1 |a Xue, Youze  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g (2020) vom: 30. Jan.  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g year:2020  |g day:30  |g month:01 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2020.2969330  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |j 2020  |b 30  |c 01