Sequence-to-Segments Networks for Detecting Segments in Videos

Detecting segments of interest from videos is a common problem for many applications. And yet it is a challenging problem as it often requires not only knowledge of individual target segments, but also contextual understanding of the entire video and the relationships between the target segments. To...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 3 vom: 06. März, Seite 1009-1021
1. Verfasser:	Wei, Zijun (VerfasserIn)
Weitere Verfasser:	Wang, Boyu, Hoai, Minh, Zhang, Jianming, Shen, Xiaohui, Lin, Zhe, Mech, Radomir, Samaras, Dimitris
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2021
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Beschreibung
Zusammenfassung:	Detecting segments of interest from videos is a common problem for many applications. And yet it is a challenging problem as it often requires not only knowledge of individual target segments, but also contextual understanding of the entire video and the relationships between the target segments. To address this problem, we propose the Sequence-to-Segments Network (S2N), a novel and general end-to-end sequential encoder-decoder architecture. S2N first encodes the input video into a sequence of hidden states that capture information progressively, as it appears in the video. It then employs the Segment Detection Unit (SDU), a novel decoding architecture, that sequentially detects segments. At each decoding step, the SDU integrates the decoder state and encoder hidden states to detect a target segment. During training, we address the problem of finding the best assignment of predicted segments to ground truth using the Hungarian Matching Algorithm with Lexicographic Cost. Additionally we propose to use the squared Earth Mover's Distance to optimize the localization errors of the segments. We show the state-of-the-art performance of S2N across numerous tasks, including video highlighting, video summarization, and human action proposal generation
Beschreibung:	Date Revised 05.02.2021 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2019.2940225