Temporal Segment Networks for Action Recognition in Videos

We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to effi...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 41(2019), 11 vom: 05. Nov., Seite 2740-2755
1. Verfasser:	Wang, Limin (VerfasserIn)
Weitere Verfasser:	Xiong, Yuanjun, Wang, Zhe, Qiao, Yu, Lin, Dahua, Tang, Xiaoou, Van Gool, Luc
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2019
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article


LEADER	01000naa a22002652 4500
001	NLM288211960
003	DE-627
005	20231225055747.0
007	cr uuu---uuuuu
008	231225s2019 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2018.2868668 \|2 doi
028	5	2	\|a pubmed24n0960.xml
035			\|a (DE-627)NLM288211960
035			\|a (NLM)30183621
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Wang, Limin \|e verfasserin \|4 aut
245	1	0	\|a Temporal Segment Networks for Action Recognition in Videos
264		1	\|c 2019
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 04.03.2020
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to efficiently learn action models by using the whole video. The learned models could be easily deployed for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the implementation of the TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on five challenging action recognition benchmarks: HMDB51 (71.0 percent), UCF101 (94.9 percent), THUMOS14 (80.1 percent), ActivityNet v1.2 (89.6 percent), and Kinetics400 (75.7 percent). In addition, using the proposed RGB difference as a simple motion representation, our method can still achieve competitive accuracy on UCF101 (91.0 percent) while running at 340 FPS. Furthermore, based on the proposed TSN framework, we won the video classification track at the ActivityNet challenge 2016 among 24 teams
650		4	\|a Journal Article
700	1		\|a Xiong, Yuanjun \|e verfasserin \|4 aut
700	1		\|a Wang, Zhe \|e verfasserin \|4 aut
700	1		\|a Qiao, Yu \|e verfasserin \|4 aut
700	1		\|a Lin, Dahua \|e verfasserin \|4 aut
700	1		\|a Tang, Xiaoou \|e verfasserin \|4 aut
700	1		\|a Van Gool, Luc \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 41(2019), 11 vom: 05. Nov., Seite 2740-2755 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:41 \|g year:2019 \|g number:11 \|g day:05 \|g month:11 \|g pages:2740-2755
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2018.2868668 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 41 \|j 2019 \|e 11 \|b 05 \|c 11 \|h 2740-2755