Joint Task-Recursive Learning for RGB-D Scene Understanding

RGB-D scene understanding under monocular camera is an emerging and challenging topic with many potential applications. In this paper, we propose a novel Task-Recursive Learning (TRL) framework to jointly and recurrently conduct three representative tasks therein containing depth estimation, surface...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 42(2020), 10 vom: 02. Okt., Seite 2608-2623
Auteur principal:	Zhang, Zhenyu (Auteur)
Autres auteurs:	Cui, Zhen, Xu, Chunyan, Jie, Zequn, Li, Xiang, Yang, Jian
Format:	Article en ligne
Langue:	English
Publié:	2020
Accès à la collection:	IEEE transactions on pattern analysis and machine intelligence
Sujets:	Journal Article

Description
Résumé:	RGB-D scene understanding under monocular camera is an emerging and challenging topic with many potential applications. In this paper, we propose a novel Task-Recursive Learning (TRL) framework to jointly and recurrently conduct three representative tasks therein containing depth estimation, surface normal prediction and semantic segmentation. TRL recursively refines the prediction results through a series of task-level interactions, where one-time cross-task interaction is abstracted as one network block of one time stage. In each stage, we serialize multiple tasks into a sequence and then recursively perform their interactions. To adaptively enhance counterpart patterns, we encapsulate interactions into a specific Task-Attentional Module (TAM) to mutually-boost the tasks from each other. Across stages, the historical experiences of previous states of tasks are selectively propagated into the next stages by using Feature-Selection unit (FS-Unit), which takes advantage of complementary information across tasks. The sequence of task-level interactions is also evolved along a coarse-to-fine scale space such that the required details may be refined progressively. Finally the task-abstracted sequence problem of multi-task prediction is framed into a recursive network. Extensive experiments on NYU-Depth v2 and SUN RGB-D datasets demonstrate that our method can recursively refines the results of the triple tasks and achieves state-of-the-art performance
Description:	Date Revised 04.09.2020 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2019.2926728