On Diversity in Image Captioning : Metrics and Methods

Diversity is one of the most important properties in image captioning, as it reflects various expressions of important concepts presented in an image. However, the most popular metrics cannot well evaluate the diversity of multiple captions. In this paper, we first propose a metric to measure the di...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 2 vom: 02. Feb., Seite 1035-1049
1. Verfasser:	Wang, Qingzhong (VerfasserIn)
Weitere Verfasser:	Wan, Jia, Chan, Antoni B
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2022
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000naa a22002652 4500
001	NLM31325205X
003	DE-627
005	20231225150158.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2020.3013834 \|2 doi
028	5	2	\|a pubmed24n1044.xml
035			\|a (DE-627)NLM31325205X
035			\|a (NLM)32749960
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Wang, Qingzhong \|e verfasserin \|4 aut
245	1	0	\|a On Diversity in Image Captioning \|b Metrics and Methods
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 28.03.2022
500			\|a Date Revised 01.04.2022
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a Diversity is one of the most important properties in image captioning, as it reflects various expressions of important concepts presented in an image. However, the most popular metrics cannot well evaluate the diversity of multiple captions. In this paper, we first propose a metric to measure the diversity of a set of captions, which is derived from latent semantic analysis (LSA), and then kernelize LSA using CIDEr (R. Vedantam et al., 2015) similarity. Compared with mBLEU (R. Shetty et al., 2017), our proposed diversity metrics show a relatively strong correlation to human evaluation. We conduct extensive experiments, finding there is a large gap between the performance of the current state-of-the-art models and human annotations considering both diversity and accuracy; the models that aim to generate captions with higher CIDEr scores normally obtain lower diversity scores, which generally learn to describe images using common words. To bridge this "diversity" gap, we consider several methods for training caption models to generate diverse captions. First, we show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions. Second, we develop approaches that directly optimize our diversity metric and CIDEr score using reinforcement learning. These proposed approaches using reinforcement learning (RL) can be unified into a self-critical (S. J. Rennie et al., 2017) framework with new RL baselines. Third, we combine accuracy and diversity into a single measure using an ensemble matrix, and then maximize the determinant of the ensemble matrix via reinforcement learning to boost diversity and accuracy, which outperforms its counterparts on the oracle test. Finally, inspired by determinantal point processes (DPP), we develop a DPP selection algorithm to select a subset of captions from a large number of candidate captions. The experimental results show that maximizing the determinant of the ensemble matrix outperforms other methods considerably improving diversity and accuracy
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Wan, Jia \|e verfasserin \|4 aut
700	1		\|a Chan, Antoni B \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 44(2022), 2 vom: 02. Feb., Seite 1035-1049 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:44 \|g year:2022 \|g number:2 \|g day:02 \|g month:02 \|g pages:1035-1049
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2020.3013834 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 44 \|j 2022 \|e 2 \|b 02 \|c 02 \|h 1035-1049