Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning

Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained indivi...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 2 vom: 15. Feb., Seite 666-683
1. Verfasser: Yan, Yichao (VerfasserIn)
Weitere Verfasser: Zhuang, Ning, Ni, Bingbing, Zhang, Jian, Xu, Minghao, Zhang, Qiang, Zhang, Zheng, Cheng, Shuo, Tian, Qi, Xu, Yi, Yang, Xiaokang, Zhang, Wenjun
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't
LEADER 01000naa a22002652 4500
001 NLM302202749
003 DE-627
005 20231225110226.0
007 cr uuu---uuuuu
008 231225s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2019.2946823  |2 doi 
028 5 2 |a pubmed24n1007.xml 
035 |a (DE-627)NLM302202749 
035 |a (NLM)31613750 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Yan, Yichao  |e verfasserin  |4 aut 
245 1 0 |a Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 28.03.2022 
500 |a Date Revised 01.04.2022 
500 |a published: Print-Electronic 
500 |a Citation Status MEDLINE 
520 |a Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
700 1 |a Zhuang, Ning  |e verfasserin  |4 aut 
700 1 |a Ni, Bingbing  |e verfasserin  |4 aut 
700 1 |a Zhang, Jian  |e verfasserin  |4 aut 
700 1 |a Xu, Minghao  |e verfasserin  |4 aut 
700 1 |a Zhang, Qiang  |e verfasserin  |4 aut 
700 1 |a Zhang, Zheng  |e verfasserin  |4 aut 
700 1 |a Cheng, Shuo  |e verfasserin  |4 aut 
700 1 |a Tian, Qi  |e verfasserin  |4 aut 
700 1 |a Xu, Yi  |e verfasserin  |4 aut 
700 1 |a Yang, Xiaokang  |e verfasserin  |4 aut 
700 1 |a Zhang, Wenjun  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 44(2022), 2 vom: 15. Feb., Seite 666-683  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:44  |g year:2022  |g number:2  |g day:15  |g month:02  |g pages:666-683 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2019.2946823  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 44  |j 2022  |e 2  |b 15  |c 02  |h 666-683