Sequence-to-Segments Networks for Detecting Segments in Videos

Detecting segments of interest from videos is a common problem for many applications. And yet it is a challenging problem as it often requires not only knowledge of individual target segments, but also contextual understanding of the entire video and the relationships between the target segments. To...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 3 vom: 06. März, Seite 1009-1021
1. Verfasser: Wei, Zijun (VerfasserIn)
Weitere Verfasser: Wang, Boyu, Hoai, Minh, Zhang, Jianming, Shen, Xiaohui, Lin, Zhe, Mech, Radomir, Samaras, Dimitris
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2021
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM301226849
003 DE-627
005 20231225104207.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2019.2940225  |2 doi 
028 5 2 |a pubmed24n1004.xml 
035 |a (DE-627)NLM301226849 
035 |a (NLM)31514124 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wei, Zijun  |e verfasserin  |4 aut 
245 1 0 |a Sequence-to-Segments Networks for Detecting Segments in Videos 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 05.02.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Detecting segments of interest from videos is a common problem for many applications. And yet it is a challenging problem as it often requires not only knowledge of individual target segments, but also contextual understanding of the entire video and the relationships between the target segments. To address this problem, we propose the Sequence-to-Segments Network (S2N), a novel and general end-to-end sequential encoder-decoder architecture. S2N first encodes the input video into a sequence of hidden states that capture information progressively, as it appears in the video. It then employs the Segment Detection Unit (SDU), a novel decoding architecture, that sequentially detects segments. At each decoding step, the SDU integrates the decoder state and encoder hidden states to detect a target segment. During training, we address the problem of finding the best assignment of predicted segments to ground truth using the Hungarian Matching Algorithm with Lexicographic Cost. Additionally we propose to use the squared Earth Mover's Distance to optimize the localization errors of the segments. We show the state-of-the-art performance of S2N across numerous tasks, including video highlighting, video summarization, and human action proposal generation 
650 4 |a Journal Article 
700 1 |a Wang, Boyu  |e verfasserin  |4 aut 
700 1 |a Hoai, Minh  |e verfasserin  |4 aut 
700 1 |a Zhang, Jianming  |e verfasserin  |4 aut 
700 1 |a Shen, Xiaohui  |e verfasserin  |4 aut 
700 1 |a Lin, Zhe  |e verfasserin  |4 aut 
700 1 |a Mech, Radomir  |e verfasserin  |4 aut 
700 1 |a Samaras, Dimitris  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 43(2021), 3 vom: 06. März, Seite 1009-1021  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:43  |g year:2021  |g number:3  |g day:06  |g month:03  |g pages:1009-1021 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2019.2940225  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 43  |j 2021  |e 3  |b 06  |c 03  |h 1009-1021