🤖 AI Summary
This work addresses the challenge of zero-shot robotic manipulation of unseen articulated objects—such as doors, drawers, and cabinets—with diverse geometries, sizes, and articulation types in real-world settings using a single policy. We propose a hierarchical neural policy: a high-level module processes point-cloud inputs to predict structure-aware subgoals, while a low-level module executes precise motion control; we further introduce weighted displacement modeling to enhance action robustness. Leveraging 42.3k high-fidelity demonstrations generated from large-scale physics simulation, we integrate point-cloud encoding, behavioral cloning, and hierarchical reinforcement learning to enable efficient sim-to-real transfer. Our method successfully opens dozens of novel articulated objects on a Franka Emika arm (across two laboratories) and an X-Arm mobile platform (in laboratory, living room, and kitchen environments), achieving, for the first time, zero-shot generalization across object morphologies, environmental contexts, and robotic hardware platforms.
📝 Abstract
This paper presents ArticuBot, in which a single learned policy enables a robotics system to open diverse categories of unseen articulated objects in the real world. This task has long been challenging for robotics due to the large variations in the geometry, size, and articulation types of such objects. Our system, Articubot, consists of three parts: generating a large number of demonstrations in physics-based simulation, distilling all generated demonstrations into a point cloud-based neural policy via imitation learning, and performing zero-shot sim2real transfer to real robotics systems. Utilizing sampling-based grasping and motion planning, our demonstration generalization pipeline is fast and effective, generating a total of 42.3k demonstrations over 322 training articulated objects. For policy learning, we propose a novel hierarchical policy representation, in which the high-level policy learns the sub-goal for the end-effector, and the low-level policy learns how to move the end-effector conditioned on the predicted goal. We demonstrate that this hierarchical approach achieves much better object-level generalization compared to the non-hierarchical version. We further propose a novel weighted displacement model for the high-level policy that grounds the prediction into the existing 3D structure of the scene, outperforming alternative policy representations. We show that our learned policy can zero-shot transfer to three different real robot settings: a fixed table-top Franka arm across two different labs, and an X-Arm on a mobile base, opening multiple unseen articulated objects across two labs, real lounges, and kitchens. Videos and code can be found on our project website: https://articubot.github.io/.