Generative Model With Coordinate Metric Learning for Object Recognition Based on 3D Models
One of the bottlenecks in acquiring a perfect database for deep learning is the tedious process of collecting and labeling data. In this paper, we propose a generative model trained with synthetic images rendered from 3D models which can reduce the burden on collecting real training data and make th...
Veröffentlicht in: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 27(2018), 12 vom: 23. Dez., Seite 5813-5826 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2018
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society |
Schlagworte: | Journal Article |
Zusammenfassung: | One of the bottlenecks in acquiring a perfect database for deep learning is the tedious process of collecting and labeling data. In this paper, we propose a generative model trained with synthetic images rendered from 3D models which can reduce the burden on collecting real training data and make the background conditions more realistic. Our architecture is composed of two sub-networks: a semantic foreground object reconstruction network based on Bayesian inference and a classification network based on multi-triplet cost training for avoiding overfitting on the monotone synthetic object surface and utilizing accurate information of synthetic images like object poses and lighting conditions which are helpful for recognizing regular photos. First, our generative model with metric learning utilizes additional foreground object channels generated from semantic foreground object reconstruction sub-network for recognizing the original input images. Multi-triplet cost function based on poses is used for metric learning which makes it possible to train an effective categorical classifier purely based on synthetic data. Second, we design a coordinate training strategy with the help of adaptive noise applied on the inputs of both of the concatenated sub-networks to make them benefit from each other and avoid inharmonious parameter tuning due to different convergence speeds of two sub-networks. Our architecture achieves the state-of-the-art accuracy of 50.5% on the ShapeNet database with data migration obstacle from synthetic images to real images. This pipeline makes it applicable to do recognition on real images only based on 3D models. Our codes are available at https://github.com/wangyida/gm-cml |
---|---|
Beschreibung: | Date Completed 07.09.2018 Date Revised 07.09.2018 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
ISSN: | 1941-0042 |
DOI: | 10.1109/TIP.2018.2858553 |