Image Captioning with End-to-end Attribute Detection and Subsequent Attributes Prediction

Semantic attention has been shown to be effective in improving the performance of image captioning. The core of semantic attention based methods is to drive the model to attend to semantically important words, or attributes. In previous works, the attribute detector and the captioning network are us...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - (2020) vom: 30. Jan.
1. Verfasser:	Huang, Yiqing (VerfasserIn)
Weitere Verfasser:	Chen, Jiansheng, Ouyang, Wanli, Wan, Weitao, Xue, Youze
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2020
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000caa a22002652 4500
001	NLM306076128
003	DE-627
005	20240229162520.0
007	cr uuu---uuuuu
008	231225s2020 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2020.2969330 \|2 doi
028	5	2	\|a pubmed24n1308.xml
035			\|a (DE-627)NLM306076128
035			\|a (NLM)32012014
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Huang, Yiqing \|e verfasserin \|4 aut
245	1	0	\|a Image Captioning with End-to-end Attribute Detection and Subsequent Attributes Prediction
264		1	\|c 2020
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 27.02.2024
500			\|a published: Print-Electronic
500			\|a Citation Status Publisher
520			\|a Semantic attention has been shown to be effective in improving the performance of image captioning. The core of semantic attention based methods is to drive the model to attend to semantically important words, or attributes. In previous works, the attribute detector and the captioning network are usually independent, leading to the insufficient usage of the semantic information. Also, all the detected attributes, no matter whether they are appropriate for the linguistic context at the current step, are attended to through the whole caption generation process. This may sometimes disrupt the captioning model to attend to incorrect visual concepts. To solve these problems, we introduce two end-to-end trainable modules to closely couple attribute detection with image captioning as well as prompt the effective uses of attributes by predicting appropriate attributes at each time step. The multimodal attribute detector (MAD) module improves the attribute detection accuracy by using not only the image features but also the word embedding of attributes already existing in most captioning models. MAD models the similarity between the semantics of attributes and the image object features to facilitate accurate detection. The subsequent attribute predictor (SAP) module dynamically predicts a concise attribute subset at each time step to mitigate the diversity of image attributes. Compared to previous attribute based methods, our approach enhances the explainability in how the attributes affect the generated words and achieves a state-of-the-art single model performance of 128.8 CIDEr-D on the MSCOCO dataset. Extensive experiments on the MSCOCO dataset show that our proposal actually improves the performances in both image captioning and attribute detection simultaneously. The codes are available at: https://github.com/ RubickH/Image-Captioning-with-MAD-and-SAP
650		4	\|a Journal Article
700	1		\|a Chen, Jiansheng \|e verfasserin \|4 aut
700	1		\|a Ouyang, Wanli \|e verfasserin \|4 aut
700	1		\|a Wan, Weitao \|e verfasserin \|4 aut
700	1		\|a Xue, Youze \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g (2020) vom: 30. Jan. \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g year:2020 \|g day:30 \|g month:01
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2020.2969330 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|j 2020 \|b 30 \|c 01