A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling

While deep convolutional neural networks have shown a remarkable success in image classification, the problems of inter-class similarities, intra-class variances, the effective combination of multi-modal data, and the spatial variability in images of objects remain to be major challenges. To address...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 40(2018), 9 vom: 15. Sept., Seite 2051-2065
1. Verfasser:	Asif, Umar (VerfasserIn)
Weitere Verfasser:	Bennamoun, Mohammed, Sohel, Ferdous A
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2018
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000caa a22002652 4500
001	NLM275386155
003	DE-627
005	20250222055149.0
007	cr uuu---uuuuu
008	231225s2018 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2017.2747134 \|2 doi
028	5	2	\|a pubmed25n0917.xml
035			\|a (DE-627)NLM275386155
035			\|a (NLM)28866483
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Asif, Umar \|e verfasserin \|4 aut
245	1	2	\|a A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling
264		1	\|c 2018
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 20.11.2019
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a While deep convolutional neural networks have shown a remarkable success in image classification, the problems of inter-class similarities, intra-class variances, the effective combination of multi-modal data, and the spatial variability in images of objects remain to be major challenges. To address these problems, this paper proposes a novel framework to learn a discriminative and spatially invariant classification model for object and indoor scene recognition using multi-modal RGB-D imagery. This is achieved through three postulates: 1) spatial invariance $-$ this is achieved by combining a spatial transformer network with a deep convolutional neural network to learn features which are invariant to spatial translations, rotations, and scale changes, 2) high discriminative capability $-$ this is achieved by introducing Fisher encoding within the CNN architecture to learn features which have small inter-class similarities and large intra-class compactness, and 3) multi-modal hierarchical fusion$-$ this is achieved through the regularization of semantic segmentation to a multi-modal CNN architecture, where class probabilities are estimated at different hierarchical levels (i.e., image- and pixel-levels), and fused into a Conditional Random Field (CRF)-based inference hypothesis, the optimization of which produces consistent class labels in RGB-D images. Extensive experimental evaluations on RGB-D object and scene datasets, and live video streams (acquired from Kinect) show that our framework produces superior object and scene classification results compared to the state-of-the-art methods
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Bennamoun, Mohammed \|e verfasserin \|4 aut
700	1		\|a Sohel, Ferdous A \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 40(2018), 9 vom: 15. Sept., Seite 2051-2065 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:40 \|g year:2018 \|g number:9 \|g day:15 \|g month:09 \|g pages:2051-2065
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2017.2747134 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 40 \|j 2018 \|e 9 \|b 15 \|c 09 \|h 2051-2065