Discriminative Style Learning for Cross-Domain Image Captioning

The cross-domain image captioning, which is trained on a source domain and generalized to other domains, usually faces the large domain shift problem. Although prior work has attempted to leverage both paired source and unpaired target data to minimize this shift, the performance is still unsatisfac...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 27., Seite 1723-1736
1. Verfasser: Yuan, Jin (VerfasserIn)
Weitere Verfasser: Zhu, Shuai, Huang, Shuyin, Zhang, Hanwang, Xiao, Yaoqiang, Li, Zhiyong, Wang, Meng
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM336194579
003 DE-627
005 20231225231602.0
007 cr uuu---uuuuu
008 231225s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2022.3145158  |2 doi 
028 5 2 |a pubmed24n1120.xml 
035 |a (DE-627)NLM336194579 
035 |a (NLM)35085078 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Yuan, Jin  |e verfasserin  |4 aut 
245 1 0 |a Discriminative Style Learning for Cross-Domain Image Captioning 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 09.02.2022 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a The cross-domain image captioning, which is trained on a source domain and generalized to other domains, usually faces the large domain shift problem. Although prior work has attempted to leverage both paired source and unpaired target data to minimize this shift, the performance is still unsatisfactory. One main reason lies in the large discrepancy in language expression between two domains, where diverse language styles are adopted to describe an image from different views, resulting in different semantic descriptions for an image. To tackle this problem, this paper proposes a Style-based Cross-domain Image Captioner (SCIC) which incorporates the discriminative style information into the encoder-decoder framework, and interprets an image as a special sentence according to external style instructions. Technically, we design a novel "Instruction-based LSTM", which adds the instruct gate to collect a style instruction, and then outputs a specified format according to that instruction. Two objectives are designed to train I-LSTM: 1) generating correct image descriptions and 2) generating correct styles, thus the model is expected to accurately capture the semantic meanings of an image by the special caption as well as understand the syntactic structure of the caption. We use MS-COCO as the source domain, and Oxford-102, CUB-200, Flickr30k as the target domains. Experimental results demonstrate that our model consistently outperforms the previous methods, and the style information incorporating with I-LSTM significantly improves the performance, with 5% CIDEr improvements at least on all datasets 
650 4 |a Journal Article 
700 1 |a Zhu, Shuai  |e verfasserin  |4 aut 
700 1 |a Huang, Shuyin  |e verfasserin  |4 aut 
700 1 |a Zhang, Hanwang  |e verfasserin  |4 aut 
700 1 |a Xiao, Yaoqiang  |e verfasserin  |4 aut 
700 1 |a Li, Zhiyong  |e verfasserin  |4 aut 
700 1 |a Wang, Meng  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 31(2022) vom: 27., Seite 1723-1736  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:31  |g year:2022  |g day:27  |g pages:1723-1736 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2022.3145158  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 31  |j 2022  |b 27  |h 1723-1736