Revisiting Articulated Parts Perception in Robot Manipulation

πŸ“… 2026-06-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes a Geometric Primary Structure (GPS) representation to enhance robotic manipulation of articulated objects, enabling the modeling of both geometric and kinematic properties of movable parts from a single RGB-D image. To support scalable and high-quality data annotation, the authors develop a VR-GPS system that leverages portable virtual reality devices, effectively balancing annotation efficiency with data fidelity. Building upon GPS predictions, a heuristic manipulation policy is designed to guide robotic task execution. The approach is evaluated on a large-scale dataset comprising 234 articulated objects and 41K frames; without any in-domain fine-tuning, it achieves a 73% success rate across 270 initial configurations spanning nine object categories.
πŸ“ Abstract
We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed two main directions: One line of work uses pose-based representation, which requires high manual cost; in parallel, affordance-based methods extract future object motion from point tracking without additional manual efforts, but suffer from low-quality data. In this paper, we propose a new representation of articulated parts, Geometric Primary Structure (GPS), an abstraction of the part geometry structure to balance scalability and quality. For efficient and scalable data collection, GPS is integrated with a portable Virtual Reality (VR) device and requires only one minute to annotate one object sequence. This direct human annotation provides higher quality than the estimated affordance. With this efficient VR-GPS system, we collect 41K frames for 234 objects across six part classes, and train a generalizable GPS model with a single RGB-D object image as input. For object manipulation, we deploy a heuristic policy based on GPS prediction. Without any in-domain fine-tuning, our method achieves an 73% success rate, covering 270 initial states for 9 objects. Our code, data and reusable tool are available at https://enlighten0707.github.io/gps.
Problem

Research questions and friction points this paper is trying to address.

articulated parts perception
robot manipulation
object affordance
geometric representation
scalable annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Primary Structure
articulated parts perception
VR-based annotation
robotic manipulation
generalizable representation
πŸ”Ž Similar Papers