Dubbing Movies via Hierarchical Phoneme Modeling and Acoustic Diffusion Denoising

Given a piece of text, a video clip, and reference audio, the movie dubbing (also known as Visual Voice Cloning, V2C) task aims to generate speeches that clone reference voice and align well with the video in both emotion and lip movement, which is more challenging than conventional text-to-speech s...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 11 vom: 08. Okt., Seite 10361-10377
1. Verfasser: Li, Liang (VerfasserIn)
Weitere Verfasser: Cong, Gaoxiang, Qi, Yuankai, Zha, Zheng-Jun, Wu, Qi, Sheng, Quan Z, Huang, Qingming, Yang, Ming-Hsuan
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2025
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article