Dubbing Movies via Hierarchical Phoneme Modeling and Acoustic Diffusion Denoising

Given a piece of text, a video clip, and reference audio, the movie dubbing (also known as Visual Voice Cloning, V2C) task aims to generate speeches that clone reference voice and align well with the video in both emotion and lip movement, which is more challenging than conventional text-to-speech s...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 11 vom: 02. Okt., Seite 10361-10377
Auteur principal: Li, Liang (Auteur)
Autres auteurs: Cong, Gaoxiang, Qi, Yuankai, Zha, Zheng-Jun, Wu, Qi, Sheng, Quan Z, Huang, Qingming, Yang, Ming-Hsuan
Format: Article en ligne
Langue:English
Publié: 2025
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article