Toward Real-World Category-Level Articulation Pose Estimation
Human life is populated with articulated objects. Current Category-level Articulation Pose Estimation (CAPE) methods are studied under the single-instance setting with a fixed kinematic structure for each category. Considering these limitations, we aim to study the problem of estimating part-level 6...
Veröffentlicht in: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 05., Seite 1072-1083 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2022
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society |
Schlagworte: | Journal Article |
Zusammenfassung: | Human life is populated with articulated objects. Current Category-level Articulation Pose Estimation (CAPE) methods are studied under the single-instance setting with a fixed kinematic structure for each category. Considering these limitations, we aim to study the problem of estimating part-level 6D pose for multiple articulated objects with unknown kinematic structures in a single RGB-D image, and reform this problem setting for real-world environments and suggest a CAPE-Real (CAPER) task setting. This setting allows varied kinematic structures within a semantic category, and multiple instances to co-exist in an observation of real world. To support this task, we build an articulated model repository ReArt-48 and present an efficient dataset generation pipeline, which contains Fast Articulated Object Modeling (FAOM) and Semi-Authentic MixEd Reality Technique (SAMERT). Accompanying the pipeline, we build a large-scale mixed reality dataset ReArtMix and a real world dataset ReArtVal. Accompanying the CAPER problem and the dataset, we propose an effective framework that exploits RGB-D input to estimate part-level pose for multiple instances in a single forward pass. In our method, we introduce object detection from RGB-D input to handle the multi-instance problem and segment each instance into several parts. To address the unknown kinematic structure issue, we propose an Articulation Parsing Network to analyze the structure of detected instance, and also build a Pair Articulation Pose Estimation module to estimate per-part 6D pose as well as joint property from connected part pairs. Extensive experiments demonstrate that the proposed method can achieve good performance on CAPER, CAPE and instance-level Robot Arm pose estimation problems. We believe it could serve as a strong baseline for future research on the CAPER task. The datasets and codes in our work will be made publicly available |
---|---|
Beschreibung: | Date Completed 14.01.2022 Date Revised 14.01.2022 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
ISSN: | 1941-0042 |
DOI: | 10.1109/TIP.2021.3138644 |