🤖 AI Summary
This paper addresses the challenge of adaptive manipulation of complex articulated objects—such as safes and knob locks—by robots operating under unobservable implicit internal states (e.g., lock engagement, hinge constraints). We propose the first adaptive manipulation framework specifically designed for implicitly state-structured articulated objects. Methodologically: (1) we construct a simulation environment encompassing nine categories of objects with diverse implicit mechanisms; (2) we develop a task-driven adaptive demonstration collection strategy; and (3) we design an end-to-end imitation learning paradigm grounded in a 3D vision diffusion model, enhanced by real–sim co-training. Our key contribution is the first systematic modeling of trial-and-error manipulation dynamics under unobservable states, enabling significantly improved success rates and cross-object generalization on tasks such as cabinet opening and lock disengagement. Extensive evaluation validates effectiveness in both simulation and on real robotic platforms.
📝 Abstract
Articulated object manipulation is a critical capability for robots to perform various tasks in real-world scenarios. Composed of multiple parts connected by joints, articulated objects are endowed with diverse functional mechanisms through complex relative motions. For example, a safe consists of a door, a handle, and a lock, where the door can only be opened when the latch is unlocked. The internal structure, such as the state of a lock or joint angle constraints, cannot be directly observed from visual observation. Consequently, successful manipulation of these objects requires adaptive adjustment based on trial and error rather than a one-time visual inference. However, previous datasets and simulation environments for articulated objects have primarily focused on simple manipulation mechanisms where the complete manipulation process can be inferred from the object's appearance. To enhance the diversity and complexity of adaptive manipulation mechanisms, we build a novel articulated object manipulation environment and equip it with 9 categories of objects. Based on the environment and objects, we further propose an adaptive demonstration collection and 3D visual diffusion-based imitation learning pipeline that learns the adaptive manipulation policy. The effectiveness of our designs and proposed method is validated through both simulation and real-world experiments. Our project page is available at: https://adamanip.github.io