Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning

Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained indivi...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 2 vom: 15. Feb., Seite 666-683
1. Verfasser:	Yan, Yichao (VerfasserIn)
Weitere Verfasser:	Zhuang, Ning, Ni, Bingbing, Zhang, Jian, Xu, Minghao, Zhang, Qiang, Zhang, Zheng, Cheng, Shuo, Tian, Qi, Xu, Yi, Yang, Xiaokang, Zhang, Wenjun
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2022
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000naa a22002652 4500
001	NLM302202749
003	DE-627
005	20231225110226.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2019.2946823 \|2 doi
028	5	2	\|a pubmed24n1007.xml
035			\|a (DE-627)NLM302202749
035			\|a (NLM)31613750
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Yan, Yichao \|e verfasserin \|4 aut
245	1	0	\|a Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 28.03.2022
500			\|a Date Revised 01.04.2022
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Zhuang, Ning \|e verfasserin \|4 aut
700	1		\|a Ni, Bingbing \|e verfasserin \|4 aut
700	1		\|a Zhang, Jian \|e verfasserin \|4 aut
700	1		\|a Xu, Minghao \|e verfasserin \|4 aut
700	1		\|a Zhang, Qiang \|e verfasserin \|4 aut
700	1		\|a Zhang, Zheng \|e verfasserin \|4 aut
700	1		\|a Cheng, Shuo \|e verfasserin \|4 aut
700	1		\|a Tian, Qi \|e verfasserin \|4 aut
700	1		\|a Xu, Yi \|e verfasserin \|4 aut
700	1		\|a Yang, Xiaokang \|e verfasserin \|4 aut
700	1		\|a Zhang, Wenjun \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 44(2022), 2 vom: 15. Feb., Seite 666-683 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:44 \|g year:2022 \|g number:2 \|g day:15 \|g month:02 \|g pages:666-683
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2019.2946823 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 44 \|j 2022 \|e 2 \|b 15 \|c 02 \|h 666-683