Dubbing Movies via Hierarchical Phoneme Modeling and Acoustic Diffusion Denoising
Given a piece of text, a video clip, and reference audio, the movie dubbing (also known as Visual Voice Cloning, V2C) task aims to generate speeches that clone reference voice and align well with the video in both emotion and lip movement, which is more challenging than conventional text-to-speech s...
Description complète
Détails bibliographiques
Publié dans: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 11 vom: 02. Okt., Seite 10361-10377
|
Auteur principal: |
Li, Liang
(Auteur) |
Autres auteurs: |
Cong, Gaoxiang,
Qi, Yuankai,
Zha, Zheng-Jun,
Wu, Qi,
Sheng, Quan Z,
Huang, Qingming,
Yang, Ming-Hsuan |
Format: | Article en ligne
|
Langue: | English |
Publié: |
2025
|
Accès à la collection: | IEEE transactions on pattern analysis and machine intelligence
|
Sujets: | Journal Article |