Vision-based Manipulation from Single Human Video with Open-World Object Graphs

📅 2024-05-30

🏛️ arXiv.org

📈 Citations: 40

✨ Influential: 1

career value

235K/year

🤖 AI Summary

This work addresses the challenge of enabling robots to generalize vision-driven manipulation skills to unseen objects in open-world environments from a single human RGB-D demonstration video. To this end, we propose ORION—a novel one-shot imitation learning framework grounded in an open-world object graph, requiring no predefined object categories or environmental priors. Its core components include object-centric modeling, RGB-D video parsing, manipulation graph extraction, conditional policy learning, and multimodal representation alignment. ORION achieves strong generalization across varying backgrounds, viewpoints, scene layouts, and previously unseen object instances, enabling robust operation planning and policy transfer. Experiments demonstrate that ORION significantly outperforms existing baselines on both short- and long-horizon tasks. It supports real-world deployment using consumer-grade devices (e.g., iPad) and successfully transfers policies to diverse physical environments, accomplishing zero-shot manipulation of novel objects.

Technology Category

Application Category

📝 Abstract

We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos. We investigate the problem of imitating robot manipulation from a single human video in the open-world setting, where a robot must learn to manipulate novel objects from one video demonstration. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video and deriving a policy that conditions on the extracted plan. Our method enables the robot to learn from videos captured by daily mobile devices such as an iPad and generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances. We systematically evaluate our method on both short-horizon and long-horizon tasks, demonstrating the efficacy of ORION in learning from a single human video in the open world. Videos can be found in the project website https://ut-austin-rpl.github.io/ORION-release.

Problem

Research questions and friction points this paper is trying to address.

Learning robot manipulation from single human videos

Generalizing policies to novel objects and environments

Extracting object-centric plans from RGB or RGB-D videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-centric manipulation plan extraction

Single RGB or RGB-D video learning

Generalization to novel object instances

🔎 Similar Papers

No similar papers found.