|
|
|
|
LEADER |
01000naa a22002652 4500 |
001 |
NLM330966480 |
003 |
DE-627 |
005 |
20231225212430.0 |
007 |
cr uuu---uuuuu |
008 |
231225s2021 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TIP.2021.3113570
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1103.xml
|
035 |
|
|
|a (DE-627)NLM330966480
|
035 |
|
|
|a (NLM)34554915
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Pramono, Rizard Renanda Adhi
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Relational Reasoning for Group Activity Recognition via Self-Attention Augmented Conditional Random Field
|
264 |
|
1 |
|c 2021
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Completed 29.09.2021
|
500 |
|
|
|a Date Revised 29.09.2021
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
520 |
|
|
|a This paper presents a new relational network for group activity recognition. The essence of the network is to integrate conditional random fields (CRFs) with self-attention to infer the temporal dependencies and spatial relationships of the actors. This combination can take advantage of the capability of CRFs in modelling the actors' features that depend on each other and the capability of self-attention in learning the temporal evolution and spatial relational contexts of every actor in videos. Additionally, there are two distinct facets of our CRF and self-attention. First, the pairwise energy of the new CRF relies on both of the temporal self-attention and spatial self-attention, which apply the self-attention mechanism to the features in time and space, respectively. Second, to address both local and non-local relationships in group activities, the spatial self-attention takes account of a collection of cliques with different scales of spatial locality. The associated mean-field inference thereafter can thus be reformulated as a self-attention network to generate the relational contexts of the actors and their individual action labels. Lastly, a bidirectional universal transformer encoder (UTE) is utilized to aggregate the forward and backward temporal context information, scene information and relational contexts for group activity recognition. A new loss function is also employed, consisting of not only the cost for the classification of individual actions and group activities, but also a contrastive loss to address the miscellaneous relational contexts between actors. Simulations show that the new approach can surpass previous works on four commonly used datasets
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Fang, Wen-Hsien
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Chen, Yie-Tarng
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|d 1992
|g 30(2021) vom: 23., Seite 8184-8199
|w (DE-627)NLM09821456X
|x 1941-0042
|7 nnns
|
773 |
1 |
8 |
|g volume:30
|g year:2021
|g day:23
|g pages:8184-8199
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TIP.2021.3113570
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 30
|j 2021
|b 23
|h 8184-8199
|