High-Fidelity and High-Efficiency Talking Portrait Synthesis With Detail-Aware Neural Radiance Fields

In this paper, we propose a novel rendering framework based on neural radiance fields (NeRF) named HH-NeRF that can generate high-resolution audio-driven talking portrait videos with high fidelity and fast rendering. Specifically, our framework includes a detail-aware NeRF module and an efficient co...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on visualization and computer graphics. - 1996. - PP(2024) vom: 31. Okt.
1. Verfasser: Wang, Muyu (VerfasserIn)
Weitere Verfasser: Zhao, Sanyuan, Dong, Xingping, Shen, Jianbing
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on visualization and computer graphics
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM379673266
003 DE-627
005 20241102232746.0
007 cr uuu---uuuuu
008 241101s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TVCG.2024.3488960  |2 doi 
028 5 2 |a pubmed24n1588.xml 
035 |a (DE-627)NLM379673266 
035 |a (NLM)39480716 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wang, Muyu  |e verfasserin  |4 aut 
245 1 0 |a High-Fidelity and High-Efficiency Talking Portrait Synthesis With Detail-Aware Neural Radiance Fields 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 01.11.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a In this paper, we propose a novel rendering framework based on neural radiance fields (NeRF) named HH-NeRF that can generate high-resolution audio-driven talking portrait videos with high fidelity and fast rendering. Specifically, our framework includes a detail-aware NeRF module and an efficient conditional super-resolution module. Firstly, a detail-aware NeRF is proposed to efficiently generate a high-fidelity low-resolution talking head, by using the encoded volume density estimation and audio-eye-aware color calculation. This module can capture natural eye blinks and high-frequency details, and maintain a similar rendering time as previous fast methods. Secondly, we present an efficient conditional super-resolution module on the dynamic scene to directly generate the high-resolution portrait with our low-resolution head. Incorporated with the prior information, such as depth map and audio features, our new proposed efficient conditional super resolution module can adopt a lightweight network to efficiently generate realistic and distinct high-resolution videos. Extensive experiments demonstrate that our method can generate more distinct and fidelity talking portraits on high resolution (900 × 900) videos compared to state-of-the-art methods. Our code is available at https://github.com/muyuWang/HHNeRF 
650 4 |a Journal Article 
700 1 |a Zhao, Sanyuan  |e verfasserin  |4 aut 
700 1 |a Dong, Xingping  |e verfasserin  |4 aut 
700 1 |a Shen, Jianbing  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on visualization and computer graphics  |d 1996  |g PP(2024) vom: 31. Okt.  |w (DE-627)NLM098269445  |x 1941-0506  |7 nnns 
773 1 8 |g volume:PP  |g year:2024  |g day:31  |g month:10 
856 4 0 |u http://dx.doi.org/10.1109/TVCG.2024.3488960  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2024  |b 31  |c 10