MOST : Motion Diffusion Model for Rare Text via Temporal Clip Banzhaf Interaction

We introduce MOST, a novel MOtion diffuSion model via Temporal clip Banzhaf interaction, aimed at addressing the persistent challenge of generating human motion from rare language prompts. While previous approaches struggle with coarse-grained matching and overlook important semantic cues due to mot...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on visualization and computer graphics. - 1996. - 31(2025), 10 vom: 13. Sept., Seite 8994-9007
Auteur principal: Wang, Yin (Auteur)
Autres auteurs: Li, Mu, Leng, Zhiying, Li, Frederick W B, Liang, Xiaohui
Format: Article en ligne
Langue:English
Publié: 2025
Accès à la collection:IEEE transactions on visualization and computer graphics
Sujets:Journal Article
LEADER 01000naa a22002652c 4500
001 NLM392025051
003 DE-627
005 20250906233700.0
007 cr uuu---uuuuu
008 250906s2025 xx |||||o 00| ||eng c
024 7 |a 10.1109/TVCG.2025.3588509  |2 doi 
028 5 2 |a pubmed25n1558.xml 
035 |a (DE-627)NLM392025051 
035 |a (NLM)40644091 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wang, Yin  |e verfasserin  |4 aut 
245 1 0 |a MOST  |b Motion Diffusion Model for Rare Text via Temporal Clip Banzhaf Interaction 
264 1 |c 2025 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 05.09.2025 
500 |a published: Print 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a We introduce MOST, a novel MOtion diffuSion model via Temporal clip Banzhaf interaction, aimed at addressing the persistent challenge of generating human motion from rare language prompts. While previous approaches struggle with coarse-grained matching and overlook important semantic cues due to motion redundancy, our key insight lies in leveraging fine-grained clip relationships to mitigate these issues. MOST's retrieval stage presents the first formulation of its kind - temporal clip Banzhaf interaction - which precisely quantifies textual-motion coherence at the clip level. This facilitates direct, fine-grained text-to-motion clip matching and eliminates prevalent redundancy. In the generation stage, a motion prompt module effectively utilizes retrieved motion clips to produce semantically consistent movements. Extensive evaluations confirm that MOST achieves state-of-the-art text-to-motion retrieval and generation performance by comprehensively addressing previous challenges, as demonstrated through quantitative and qualitative results highlighting its effectiveness, especially for rare prompts 
650 4 |a Journal Article 
700 1 |a Li, Mu  |e verfasserin  |4 aut 
700 1 |a Leng, Zhiying  |e verfasserin  |4 aut 
700 1 |a Li, Frederick W B  |e verfasserin  |4 aut 
700 1 |a Liang, Xiaohui  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on visualization and computer graphics  |d 1996  |g 31(2025), 10 vom: 13. Sept., Seite 8994-9007  |w (DE-627)NLM098269445  |x 1941-0506  |7 nnas 
773 1 8 |g volume:31  |g year:2025  |g number:10  |g day:13  |g month:09  |g pages:8994-9007 
856 4 0 |u http://dx.doi.org/10.1109/TVCG.2025.3588509  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 31  |j 2025  |e 10  |b 13  |c 09  |h 8994-9007