Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition

Skeleton-based action recognition is widely used in varied areas, e.g., surveillance and human-machine interaction. Existing models are mainly learned in a supervised manner, thus heavily depending on large-scale labeled data, which could be infeasible when labels are prohibitively expensive. In thi...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 23., Seite 6224-6238
1. Verfasser: Wang, Peng (VerfasserIn)
Weitere Verfasser: Wen, Jun, Si, Chenyang, Qian, Yuntao, Wang, Liang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM346663334
003 DE-627
005 20231226032006.0
007 cr uuu---uuuuu
008 231226s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2022.3207577  |2 doi 
028 5 2 |a pubmed24n1155.xml 
035 |a (DE-627)NLM346663334 
035 |a (NLM)36149998 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wang, Peng  |e verfasserin  |4 aut 
245 1 0 |a Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 30.09.2022 
500 |a Date Revised 30.09.2022 
500 |a published: Print-Electronic 
500 |a Citation Status MEDLINE 
520 |a Skeleton-based action recognition is widely used in varied areas, e.g., surveillance and human-machine interaction. Existing models are mainly learned in a supervised manner, thus heavily depending on large-scale labeled data, which could be infeasible when labels are prohibitively expensive. In this paper, we propose a novel Contrast-Reconstruction Representation Learning network (CRRL) that simultaneously captures postures and motion dynamics for unsupervised skeleton-based action recognition. It consists of three parts: Sequence Reconstructor (SER), Contrastive Motion Learner (CML), and Information Fuser (INF). SER learns representation from skeleton coordinate sequence via reconstruction. However the learned representation tends to focus on trivial postural coordinates and be hesitant in motion learning. To enhance the learning of motions, CML performs contrastive learning between the representation learned from coordinate sequences and additional velocity sequences, respectively. Finally, in the INF module, we explore varied strategies to combine SER and CML, and propose to couple postures and motions via a knowledge-distillation based fusion strategy which transfers the motion learning from CML to SER. Experimental results on several benchmarks, i.e., NTU RGB+D 60/120, PKU-MMD, CMU, and NW-UCLA, demonstrate the promise of the our method by outperforming state-of-the-art approaches 
650 4 |a Journal Article 
700 1 |a Wen, Jun  |e verfasserin  |4 aut 
700 1 |a Si, Chenyang  |e verfasserin  |4 aut 
700 1 |a Qian, Yuntao  |e verfasserin  |4 aut 
700 1 |a Wang, Liang  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 31(2022) vom: 23., Seite 6224-6238  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:31  |g year:2022  |g day:23  |g pages:6224-6238 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2022.3207577  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 31  |j 2022  |b 23  |h 6224-6238