Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification

For CNN-based visual action recognition, the accuracy may be increased if local key action regions are focused on. The task of self-attention is to focus on key features and ignore irrelevant information. So, self-attention is useful for action recognition. However, current self-attention methods us...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 10 vom: 01. Okt., Seite 7010-7028
Auteur principal:	Hu, Weiming (Auteur)
Autres auteurs:	Liu, Haowei, Du, Yang, Yuan, Chunfeng, Li, Bing, Maybank, Stephen
Format:	Article en ligne
Langue:	English
Publié:	2022
Accès à la collection:	IEEE transactions on pattern analysis and machine intelligence
Sujets:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000caa a22002652c 4500
001	NLM32859069X
003	DE-627
005	20250302070850.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2021.3100277 \|2 doi
028	5	2	\|a pubmed25n1095.xml
035			\|a (DE-627)NLM32859069X
035			\|a (NLM)34314355
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Hu, Weiming \|e verfasserin \|4 aut
245	1	0	\|a Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 16.09.2022
500			\|a Date Revised 19.11.2022
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a For CNN-based visual action recognition, the accuracy may be increased if local key action regions are focused on. The task of self-attention is to focus on key features and ignore irrelevant information. So, self-attention is useful for action recognition. However, current self-attention methods usually ignore correlations among local feature vectors at spatial positions in CNN feature maps. In this paper, we propose an effective interaction-aware self-attention model which can extract information about the interactions between feature vectors to learn attention maps. Since the different layers in a network capture feature maps at different scales, we introduce a spatial pyramid with the feature maps at different layers for attention modeling. The multi-scale information is utilized to obtain more accurate attention scores. These attention scores are used to weight the local feature vectors of the feature maps and then calculate attentional feature maps. Since the number of feature maps input to the spatial pyramid attention layer is unrestricted, we easily extend this attention layer to a spatio-temporal version. Our model can be embedded in any general CNN to form a video-level end-to-end attention network for action recognition. Several methods are investigated to combine the RGB and flow streams to obtain accurate predictions of human actions. Experimental results show that our method achieves state-of-the-art results on the datasets UCF101, HMDB51, Kinetics-400, and untrimmed Charades
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Liu, Haowei \|e verfasserin \|4 aut
700	1		\|a Du, Yang \|e verfasserin \|4 aut
700	1		\|a Yuan, Chunfeng \|e verfasserin \|4 aut
700	1		\|a Li, Bing \|e verfasserin \|4 aut
700	1		\|a Maybank, Stephen \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 44(2022), 10 vom: 01. Okt., Seite 7010-7028 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnas
773	1	8	\|g volume:44 \|g year:2022 \|g number:10 \|g day:01 \|g month:10 \|g pages:7010-7028
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2021.3100277 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 44 \|j 2022 \|e 10 \|b 01 \|c 10 \|h 7010-7028