Generative Model With Coordinate Metric Learning for Object Recognition Based on 3D Models

One of the bottlenecks in acquiring a perfect database for deep learning is the tedious process of collecting and labeling data. In this paper, we propose a generative model trained with synthetic images rendered from 3D models which can reduce the burden on collecting real training data and make th...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 27(2018), 12 vom: 23. Dez., Seite 5813-5826
1. Verfasser: Wang, Yida (VerfasserIn)
Weitere Verfasser: Deng, Weihong
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2018
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM28680932X
003 DE-627
005 20231225052606.0
007 cr uuu---uuuuu
008 231225s2018 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2018.2858553  |2 doi 
028 5 2 |a pubmed24n0956.xml 
035 |a (DE-627)NLM28680932X 
035 |a (NLM)30040643 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wang, Yida  |e verfasserin  |4 aut 
245 1 0 |a Generative Model With Coordinate Metric Learning for Object Recognition Based on 3D Models 
264 1 |c 2018 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 07.09.2018 
500 |a Date Revised 07.09.2018 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a One of the bottlenecks in acquiring a perfect database for deep learning is the tedious process of collecting and labeling data. In this paper, we propose a generative model trained with synthetic images rendered from 3D models which can reduce the burden on collecting real training data and make the background conditions more realistic. Our architecture is composed of two sub-networks: a semantic foreground object reconstruction network based on Bayesian inference and a classification network based on multi-triplet cost training for avoiding overfitting on the monotone synthetic object surface and utilizing accurate information of synthetic images like object poses and lighting conditions which are helpful for recognizing regular photos. First, our generative model with metric learning utilizes additional foreground object channels generated from semantic foreground object reconstruction sub-network for recognizing the original input images. Multi-triplet cost function based on poses is used for metric learning which makes it possible to train an effective categorical classifier purely based on synthetic data. Second, we design a coordinate training strategy with the help of adaptive noise applied on the inputs of both of the concatenated sub-networks to make them benefit from each other and avoid inharmonious parameter tuning due to different convergence speeds of two sub-networks. Our architecture achieves the state-of-the-art accuracy of 50.5% on the ShapeNet database with data migration obstacle from synthetic images to real images. This pipeline makes it applicable to do recognition on real images only based on 3D models. Our codes are available at https://github.com/wangyida/gm-cml 
650 4 |a Journal Article 
700 1 |a Deng, Weihong  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 27(2018), 12 vom: 23. Dez., Seite 5813-5826  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:27  |g year:2018  |g number:12  |g day:23  |g month:12  |g pages:5813-5826 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2018.2858553  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 27  |j 2018  |e 12  |b 23  |c 12  |h 5813-5826