|
|
|
|
LEADER |
01000caa a22002652 4500 |
001 |
NLM306068605 |
003 |
DE-627 |
005 |
20240229162519.0 |
007 |
cr uuu---uuuuu |
008 |
231225s2020 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TIP.2020.2963950
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1308.xml
|
035 |
|
|
|a (DE-627)NLM306068605
|
035 |
|
|
|a (NLM)32011250
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Zhao, Zhou
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks
|
264 |
|
1 |
|c 2020
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Revised 27.02.2024
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status Publisher
|
520 |
|
|
|a As a challenging task in visual information retrieval, open-ended long-form video question answering automatically generates the natural language answer from the referenced video content according to the given question. However, the existing video question answering works mainly focus on the short-form video, which may be ineffectively applied for long-form video question answering directly, due to the insufficiency of modeling the semantic representation of long-form video content. In this paper, we study the problem of open-ended long-form video question answering from the viewpoint of hierarchical multimodal conditional adversarial network learning. We propose the hierarchical attentional encoder network to learn the joint representation of long-form video content and given question with adaptive video segmentation. We then devise the reinforced decoder network to generate the natural language answer for openended video question answering with multi-modal conditional adversarial network learning. We construct three large-scale open-ended video question answering datasets. The extensive experiments validate the effectiveness of our method
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Xiao, Shuwen
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Song, Zehan
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Lu, Chujie
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Xiao, Jun
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Zhuang, Yueting
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|d 1992
|g (2020) vom: 29. Jan.
|w (DE-627)NLM09821456X
|x 1941-0042
|7 nnns
|
773 |
1 |
8 |
|g year:2020
|g day:29
|g month:01
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TIP.2020.2963950
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|j 2020
|b 29
|c 01
|