Speech-Driven Personalized Gesture Synthetics : Harnessing Automatic Fuzzy Feature Inference

Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional appr...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on visualization and computer graphics. - 1996. - 30(2024), 10 vom: 28. Sept., Seite 6984-6996
1. Verfasser: Zhang, Fan (VerfasserIn)
Weitere Verfasser: Wang, Zhaohan, Lyu, Xin, Zhao, Siyuan, Li, Mengjian, Geng, Weidong, Ji, Naye, Du, Hui, Gao, Fuxing, Wu, Hao, Li, Shunman
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on visualization and computer graphics
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM37145753X
003 DE-627
005 20240906232701.0
007 cr uuu---uuuuu
008 240426s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TVCG.2024.3393236  |2 doi 
028 5 2 |a pubmed24n1525.xml 
035 |a (DE-627)NLM37145753X 
035 |a (NLM)38656863 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Zhang, Fan  |e verfasserin  |4 aut 
245 1 0 |a Speech-Driven Personalized Gesture Synthetics  |b Harnessing Automatic Fuzzy Feature Inference 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 04.09.2024 
500 |a Date Revised 05.09.2024 
500 |a published: Print-Electronic 
500 |a Citation Status MEDLINE 
520 |a Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the expressiveness of resulting gestures and limit their applicability. To address these challenges, we present Persona-Gestor, a novel end-to-end generative model designed to generate highly personalized 3D full-body gestures solely relying on raw speech audio. The model combines a fuzzy feature extractor and a non-autoregressive Adaptive Layer Normalization (AdaLN) transformer diffusion architecture (DiTs-based). The fuzzy feature extractor harnesses a fuzzy inference strategy that automatically infers implicit, continuous fuzzy features. These fuzzy features, represented as a unified latent feature, are fed into the AdaLN transformer. The AdaLN transformer introduces a conditional mechanism that applies a uniform function across all tokens, thereby effectively modeling the correlation between the fuzzy features and the gesture sequence. This module ensures a high level of gesture-speech synchronization while preserving naturalness. Finally, we employ the diffusion model to train and infer various gestures. Extensive subjective and objective evaluations on the Trinity, ZEGGS, and BEAT datasets confirm our model's superior performance to the current state-of-the-art approaches. Persona-Gestor improves the system's usability and generalization capabilities, setting a new benchmark in speech-driven gesture synthesis and broadening the horizon for virtual human technology 
650 4 |a Journal Article 
700 1 |a Wang, Zhaohan  |e verfasserin  |4 aut 
700 1 |a Lyu, Xin  |e verfasserin  |4 aut 
700 1 |a Zhao, Siyuan  |e verfasserin  |4 aut 
700 1 |a Li, Mengjian  |e verfasserin  |4 aut 
700 1 |a Geng, Weidong  |e verfasserin  |4 aut 
700 1 |a Ji, Naye  |e verfasserin  |4 aut 
700 1 |a Du, Hui  |e verfasserin  |4 aut 
700 1 |a Gao, Fuxing  |e verfasserin  |4 aut 
700 1 |a Wu, Hao  |e verfasserin  |4 aut 
700 1 |a Li, Shunman  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on visualization and computer graphics  |d 1996  |g 30(2024), 10 vom: 28. Sept., Seite 6984-6996  |w (DE-627)NLM098269445  |x 1941-0506  |7 nnns 
773 1 8 |g volume:30  |g year:2024  |g number:10  |g day:28  |g month:09  |g pages:6984-6996 
856 4 0 |u http://dx.doi.org/10.1109/TVCG.2024.3393236  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 30  |j 2024  |e 10  |b 28  |c 09  |h 6984-6996