Reconstructive Sequence-Graph Network for Video Summarization

Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization. Current approaches mainly devote to modeling the video as a frame sequence by recurrent neural networks. However, one potential limitation of the sequence models is that they focus on capturing...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 5 vom: 15. Mai, Seite 2793-2801
1. Verfasser: Zhao, Bin (VerfasserIn)
Weitere Verfasser: Li, Haopeng, Lu, Xiaoqiang, Li, Xuelong
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM323910181
003 DE-627
005 20231225185206.0
007 cr uuu---uuuuu
008 231225s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2021.3072117  |2 doi 
028 5 2 |a pubmed24n1079.xml 
035 |a (DE-627)NLM323910181 
035 |a (NLM)33835915 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Zhao, Bin  |e verfasserin  |4 aut 
245 1 0 |a Reconstructive Sequence-Graph Network for Video Summarization 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 04.04.2022 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization. Current approaches mainly devote to modeling the video as a frame sequence by recurrent neural networks. However, one potential limitation of the sequence models is that they focus on capturing local neighborhood dependencies while the high-order dependencies in long distance are not fully exploited. In general, the frames in each shot record a certain activity and vary smoothly over time, but the multi-hop relationships occur frequently among shots. In this case, both the local and global dependencies are important for understanding the video content. Motivated by this point, we propose a reconstructive sequence-graph network (RSGN) to encode the frames and shots as sequence and graph hierarchically, where the frame-level dependencies are encoded by long short-term memory (LSTM), and the shot-level dependencies are captured by the graph convolutional network (GCN). Then, the videos are summarized by exploiting both the local and global dependencies among shots. Besides, a reconstructor is developed to reward the summary generator, so that the generator can be optimized in an unsupervised manner, which can avert the lack of annotated data in video summarization. Furthermore, under the guidance of reconstruction loss, the predicted summary can better preserve the main video content and shot-level dependencies. Practically, the experimental results on three popular datasets (i.e., SumMe, TVsum and VTW) have demonstrated the superiority of our proposed approach to the summarization task 
650 4 |a Journal Article 
700 1 |a Li, Haopeng  |e verfasserin  |4 aut 
700 1 |a Lu, Xiaoqiang  |e verfasserin  |4 aut 
700 1 |a Li, Xuelong  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 44(2022), 5 vom: 15. Mai, Seite 2793-2801  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:44  |g year:2022  |g number:5  |g day:15  |g month:05  |g pages:2793-2801 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2021.3072117  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 44  |j 2022  |e 5  |b 15  |c 05  |h 2793-2801