|
|
|
|
LEADER |
01000naa a22002652 4500 |
001 |
NLM320385752 |
003 |
DE-627 |
005 |
20231225173447.0 |
007 |
cr uuu---uuuuu |
008 |
231225s2021 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TIP.2021.3051756
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1067.xml
|
035 |
|
|
|a (DE-627)NLM320385752
|
035 |
|
|
|a (NLM)33476268
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Gu, Mao
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Graph-Based Multi-Interaction Network for Video Question Answering
|
264 |
|
1 |
|c 2021
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Revised 15.02.2021
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
520 |
|
|
|a Video question answering is an important task combining both Natural Language Processing and Computer Vision, which requires a machine to obtain a thorough understanding of the video. Most existing approaches simply capture spatio-temporal information in videos by using a combination of recurrent and convolutional neural networks. Nonetheless, most previous work focus on only salient frames or regions, which normally lacks some significant details, such as potential location and action relations. In this paper, we propose a new method called Graph-based Multi-interaction Network for video question answering. In our model, a new attention mechanism named multi-interaction is designed to capture both element-wise and segment-wise sequence interactions simultaneously, which can be found between and inside the multi-modal inputs. Moreover, we propose a graph-based relation-aware neural network to explore a more fine-grained visual representation, which could explore the relationships and dependencies between objects spatially and temporally. We evaluate our method on TGIF-QA and other two video QA datasets. The qualitative and quantitative experimental results show the effectiveness of our model, which achieves state-of-the-art performance
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Zhao, Zhou
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Jin, Weike
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Hong, Richang
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Wu, Fei
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|d 1992
|g 30(2021) vom: 21., Seite 2758-2770
|w (DE-627)NLM09821456X
|x 1941-0042
|7 nnns
|
773 |
1 |
8 |
|g volume:30
|g year:2021
|g day:21
|g pages:2758-2770
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TIP.2021.3051756
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 30
|j 2021
|b 21
|h 2758-2770
|