Self-Supervised Monocular Depth Estimation With Multiscale Perception

Extracting 3D information from a single optical image is very attractive. Recently emerging self-supervised methods can learn depth representations without using ground truth depth maps as training data by transforming the depth prediction task into an image synthesis task. However, existing methods...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 19., Seite 3251-3266
Auteur principal:	Zhang, Yourun (Auteur)
Autres auteurs:	Gong, Maoguo, Li, Jianzhao, Zhang, Mingyang, Jiang, Fenlong, Zhao, Hongyu
Format:	Article en ligne
Langue:	English
Publié:	2022
Accès à la collection:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM339684364
003	DE-627
005	20250303063613.0
007	cr uuu---uuuuu
008	231226s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2022.3167307 \|2 doi
028	5	2	\|a pubmed25n1132.xml
035			\|a (DE-627)NLM339684364
035			\|a (NLM)35439134
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Zhang, Yourun \|e verfasserin \|4 aut
245	1	0	\|a Self-Supervised Monocular Depth Estimation With Multiscale Perception
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 27.04.2022
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Extracting 3D information from a single optical image is very attractive. Recently emerging self-supervised methods can learn depth representations without using ground truth depth maps as training data by transforming the depth prediction task into an image synthesis task. However, existing methods rely on a differentiable bilinear sampler for image synthesis, which results in each pixel in a synthetic image being derived from only four pixels in the source image and causes each pixel in the depth map to perceive only a few pixels in the source image. In addition, when calculating the photometric error between a synthetic image and its corresponding target image, existing methods only consider the photometric error within a small neighborhood of each single pixel and therefore ignore correlations between larger areas, which causes the model to tend to fall into the local optima for small patches. In order to extend the perceptual area of the depth map over the source image, we propose a novel multi-scale method that downsamples the predicted depth map and performs image synthesis at different resolutions, which enables each pixel in the depth map to perceive more pixels in the source image and improves the performance of the model. As for the locality of photometric error, we propose a structural similarity (SSIM) pyramid loss to allow the model to sense the difference between images in multiple areas of different sizes. Experimental results show that our method achieves superior performance on both outdoor and indoor benchmarks
650		4	\|a Journal Article
700	1		\|a Gong, Maoguo \|e verfasserin \|4 aut
700	1		\|a Li, Jianzhao \|e verfasserin \|4 aut
700	1		\|a Zhang, Mingyang \|e verfasserin \|4 aut
700	1		\|a Jiang, Fenlong \|e verfasserin \|4 aut
700	1		\|a Zhao, Hongyu \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 31(2022) vom: 19., Seite 3251-3266 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnas
773	1	8	\|g volume:31 \|g year:2022 \|g day:19 \|g pages:3251-3266
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2022.3167307 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 31 \|j 2022 \|b 19 \|h 3251-3266