Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities eff...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 14., Seite 5626-5640
1. Verfasser: Yu, Zitong (VerfasserIn)
Weitere Verfasser: Zhou, Benjia, Wan, Jun, Wang, Pichao, Chen, Haoyu, Liu, Xin, Li, Stan Z, Zhao, Guoying
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2021
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM326735305
003 DE-627
005 20231225195234.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2021.3087348  |2 doi 
028 5 2 |a pubmed24n1089.xml 
035 |a (DE-627)NLM326735305 
035 |a (NLM)34125676 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Yu, Zitong  |e verfasserin  |4 aut 
245 1 0 |a Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 21.06.2021 
500 |a Date Revised 21.06.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings. The code is available at https://github.com/ZitongYu/3DCDC-NAS 
650 4 |a Journal Article 
700 1 |a Zhou, Benjia  |e verfasserin  |4 aut 
700 1 |a Wan, Jun  |e verfasserin  |4 aut 
700 1 |a Wang, Pichao  |e verfasserin  |4 aut 
700 1 |a Chen, Haoyu  |e verfasserin  |4 aut 
700 1 |a Liu, Xin  |e verfasserin  |4 aut 
700 1 |a Li, Stan Z  |e verfasserin  |4 aut 
700 1 |a Zhao, Guoying  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 30(2021) vom: 14., Seite 5626-5640  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:30  |g year:2021  |g day:14  |g pages:5626-5640 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2021.3087348  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 30  |j 2021  |b 14  |h 5626-5640