Outdoor Inverse Rendering From a Single Image Using Multiview Self-Supervision

In this paper we show how to perform scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled image using a fully convolutional neural network. The network takes an RGB image as input, regresses albedo, shadow and normal maps from which we infer least squa...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 7 vom: 01. Juli, Seite 3659-3675
1. Verfasser: Yu, Ye (VerfasserIn)
Weitere Verfasser: Smith, William A P
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM32121286X
003 DE-627
005 20231225175329.0
007 cr uuu---uuuuu
008 231225s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2021.3058105  |2 doi 
028 5 2 |a pubmed24n1070.xml 
035 |a (DE-627)NLM32121286X 
035 |a (NLM)33560977 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Yu, Ye  |e verfasserin  |4 aut 
245 1 0 |a Outdoor Inverse Rendering From a Single Image Using Multiview Self-Supervision 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 07.06.2022 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a In this paper we show how to perform scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled image using a fully convolutional neural network. The network takes an RGB image as input, regresses albedo, shadow and normal maps from which we infer least squares optimal spherical harmonic lighting coefficients. Our network is trained using large uncontrolled multiview and timelapse image collections without ground truth. By incorporating a differentiable renderer, our network can learn from self-supervision. Since the problem is ill-posed we introduce additional supervision. Our key insight is to perform offline multiview stereo (MVS) on images containing rich illumination variation. From the MVS pose and depth maps, we can cross project between overlapping views such that Siamese training can be used to ensure consistent estimation of photometric invariants. MVS depth also provides direct coarse supervision for normal map estimation. We believe this is the first attempt to use MVS supervision for learning inverse rendering. In addition, we learn a statistical natural illumination prior. We evaluate performance on inverse rendering, normal map estimation and intrinsic image decomposition benchmarks 
650 4 |a Journal Article 
700 1 |a Smith, William A P  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 44(2022), 7 vom: 01. Juli, Seite 3659-3675  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:44  |g year:2022  |g number:7  |g day:01  |g month:07  |g pages:3659-3675 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2021.3058105  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 44  |j 2022  |e 7  |b 01  |c 07  |h 3659-3675