Modeling Sub-Actions for Weakly Supervised Temporal Action Localization

As a challenging task of high-level video understanding, weakly supervised temporal action localization has attracted more attention recently. Due to the usage of video-level category labels, this task is usually formulated as the task of classification, which always suffers from the contradiction b...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 19., Seite 5154-5167
1. Verfasser:	Huang, Linjiang (VerfasserIn)
Weitere Verfasser:	Huang, Yan, Ouyang, Wanli, Wang, Liang
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2021
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000naa a22002652 4500
001	NLM325352755
003	DE-627
005	20231225192246.0
007	cr uuu---uuuuu
008	231225s2021 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2021.3078324 \|2 doi
028	5	2	\|a pubmed24n1084.xml
035			\|a (DE-627)NLM325352755
035			\|a (NLM)33983884
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Huang, Linjiang \|e verfasserin \|4 aut
245	1	0	\|a Modeling Sub-Actions for Weakly Supervised Temporal Action Localization
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 25.05.2021
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a As a challenging task of high-level video understanding, weakly supervised temporal action localization has attracted more attention recently. Due to the usage of video-level category labels, this task is usually formulated as the task of classification, which always suffers from the contradiction between classification and detection. In this paper, we describe a novel approach to alleviate the contradiction for detecting more complete action instances by explicitly modeling sub-actions. Our method makes use of three innovations to model the latent sub-actions. First, our framework uses prototypes to represent sub-actions, which can be automatically learned in an end-to-end way. Second, we regard the relations among sub-actions as a graph, and construct the correspondences between sub-actions and actions by the graph pooling operation. Doing so not only makes the sub-actions inter-dependent to facilitate the multi-label setting, but also naturally use the video-level labels as weak supervision. Third, we devise three complementary loss functions, namely, representation loss, balance loss and relation loss to ensure the learned sub-actions are diverse and have clear semantic meanings. Experimental results on THUMOS14 and ActivityNet1.3 datasets demonstrate the effectiveness of our method and superior performance over state-of-the-art approaches
650		4	\|a Journal Article
700	1		\|a Huang, Yan \|e verfasserin \|4 aut
700	1		\|a Ouyang, Wanli \|e verfasserin \|4 aut
700	1		\|a Wang, Liang \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 30(2021) vom: 19., Seite 5154-5167 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g volume:30 \|g year:2021 \|g day:19 \|g pages:5154-5167
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2021.3078324 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 30 \|j 2021 \|b 19 \|h 5154-5167