Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 40(2018), 6 vom: 02. Juni, Seite 1367-1381
1. Verfasser:	Wu, Qi (VerfasserIn)
Weitere Verfasser:	Shen, Chunhua, Wang, Peng, Dick, Anthony, van den Hengel, Anton
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2018
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000naa a22002652 4500
001	NLM272548359
003	DE-627
005	20231224235112.0
007	cr uuu---uuuuu
008	231224s2018 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2017.2708709 \|2 doi
028	5	2	\|a pubmed24n0908.xml
035			\|a (DE-627)NLM272548359
035			\|a (NLM)28574341
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Wu, Qi \|e verfasserin \|4 aut
245	1	0	\|a Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
264		1	\|c 2018
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 04.04.2019
500			\|a Date Revised 04.04.2019
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Shen, Chunhua \|e verfasserin \|4 aut
700	1		\|a Wang, Peng \|e verfasserin \|4 aut
700	1		\|a Dick, Anthony \|e verfasserin \|4 aut
700	1		\|a van den Hengel, Anton \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 40(2018), 6 vom: 02. Juni, Seite 1367-1381 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:40 \|g year:2018 \|g number:6 \|g day:02 \|g month:06 \|g pages:1367-1381
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2017.2708709 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 40 \|j 2018 \|e 6 \|b 02 \|c 06 \|h 1367-1381