Self-Supervised Video-Based Action Recognition With Disturbances

Self-supervised video-based action recognition is a challenging task, which needs to extract the principal information characterizing the action from content-diversified videos over large unlabeled datasets. However, most existing methods choose to exploit the natural spatio-temporal properties of v...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 26., Seite 2493-2507
1. Verfasser: Lin, Wei (VerfasserIn)
Weitere Verfasser: Ding, Xinghao, Huang, Yue, Zeng, Huanqiang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM356031128
003 DE-627
005 20240605232101.0
007 cr uuu---uuuuu
008 231226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2023.3269228  |2 doi 
028 5 2 |a pubmed24n1429.xml 
035 |a (DE-627)NLM356031128 
035 |a (NLM)37099471 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Lin, Wei  |e verfasserin  |4 aut 
245 1 0 |a Self-Supervised Video-Based Action Recognition With Disturbances 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 08.05.2023 
500 |a Date Revised 05.06.2024 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Self-supervised video-based action recognition is a challenging task, which needs to extract the principal information characterizing the action from content-diversified videos over large unlabeled datasets. However, most existing methods choose to exploit the natural spatio-temporal properties of video to obtain effective action representations from a visual perspective, while ignoring the exploration of the semantic that is closer to human cognition. For that, a self-supervised Video-based Action Recognition method with Disturbances called VARD, which extracts the principal information of the action in terms of the visual and semantic, is proposed. Specifically, according to cognitive neuroscience research, the recognition ability of humans is activated by visual and semantic attributes. An intuitive impression is that minor changes of the actor or scene in video do not affect one person's recognition of the action. On the other hand, different humans always make consistent opinions when they recognize the same action video. In other words, for an action video, the necessary information that remains constant despite the disturbances in the visual video or the semantic encoding process is sufficient to represent the action. Therefore, to learn such information, we construct a positive clip/embedding for each action video. Compared to the original video clip/embedding, the positive clip/embedding is disturbed visually/semantically by Video Disturbance and Embedding Disturbance. Our objective is to pull the positive closer to the original clip/embedding in the latent space. In this way, the network is driven to focus on the principal information of the action while the impact of sophisticated details and inconsequential variations is weakened. It is worthwhile to mention that the proposed VARD does not require optical flow, negative samples, and pretext tasks. Extensive experiments conducted on the UCF101 and HMDB51 datasets demonstrate that the proposed VARD effectively improves the strong baseline and outperforms multiple classical and advanced self-supervised action recognition methods 
650 4 |a Journal Article 
700 1 |a Ding, Xinghao  |e verfasserin  |4 aut 
700 1 |a Huang, Yue  |e verfasserin  |4 aut 
700 1 |a Zeng, Huanqiang  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 32(2023) vom: 26., Seite 2493-2507  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:32  |g year:2023  |g day:26  |g pages:2493-2507 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2023.3269228  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 32  |j 2023  |b 26  |h 2493-2507