|
|
|
|
LEADER |
01000naa a22002652 4500 |
001 |
NLM347306128 |
003 |
DE-627 |
005 |
20231226033522.0 |
007 |
cr uuu---uuuuu |
008 |
231226s2022 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TIP.2022.3211467
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1157.xml
|
035 |
|
|
|a (DE-627)NLM347306128
|
035 |
|
|
|a (NLM)36215365
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Nguyen, Thanh-Son
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Effective Multimodal Encoding for Image Paragraph Captioning
|
264 |
|
1 |
|c 2022
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Revised 19.10.2022
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
520 |
|
|
|a In this paper, we present a regularization-based image paragraph generation method. We propose a novel multimodal encoding generator (MEG) to generate effective multimodal encoding that captures not only an individual sentence but also visual and paragraph-sequential information. By utilizing the encoding generated by MEG, we regularize a paragraph generation model that allows us to improve the results of the captioning model in all the evaluation metrics. With the support of the proposed MEG model for regularization, our paragraph generation model obtains state-of-the-art results on the Stanford paragraph dataset once further optimized with reinforcement learning. Moreover, we perform extensive empirical analysis on the capabilities of MEG encoding. A qualitative visualization based on t-distributed stochastic neighbor embedding (t-SNE) illustrates that sentence encoding generated by MEG captures some level of semantic information. We also demonstrate that the MEG encoding captures meaningful textual and visual information by performing multimodal sentence retrieval tasks and image instance retrieval given a paragraph query
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Fernando, Basura
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|d 1992
|g 31(2022) vom: 10., Seite 6381-6395
|w (DE-627)NLM09821456X
|x 1941-0042
|7 nnns
|
773 |
1 |
8 |
|g volume:31
|g year:2022
|g day:10
|g pages:6381-6395
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TIP.2022.3211467
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 31
|j 2022
|b 10
|h 6381-6395
|