🤖 AI Summary
This work addresses the challenge of dexterous manipulation, which is hindered by the high cost of collecting large-scale robot demonstration data. While first-person human videos offer rich behavioral diversity, they suffer from a cross-domain gap in both visual appearance and action representation. The paper proposes EgoEngine, a framework that, for the first time, enables zero-shot learning of dexterous visuomotor policies directly from human egocentric videos without any real robot demonstrations. EgoEngine achieves this through visual replay, action mapping, and feasibility-constrained optimization to generate temporally aligned, context-preserving synthetic robot observation videos and task-aligned action trajectories in an end-to-end manner. Experiments demonstrate that EgoEngine efficiently produces high-quality data in both simulation and on real robots, successfully training generalizable zero-shot dexterous manipulation policies and overcoming the scalability bottleneck in cross-domain imitation learning.
📝 Abstract
Dexterous manipulation is limited by the cost of collecting large-scale robot demonstrations. Egocentric human videos offer a scalable source of diverse manipulation behaviors, but directly using them for robot learning requires bridging two gaps: the visual gap between human and robot observations, and the action gap between human motion and robot-executable action. We propose EgoEngine, a scalable framework for transforming egocentric human manipulation videos into high-fidelity robot data. Given an egocentric RGB video, EgoEngine produces: (i) a high-fidelity robot observation video replacing human with robot while preserving scene context and temporal alignment, and (ii) a task-aligned, executable robot action trajectory under feasibility constraints. Experiments in simulation and on real robots show that EgoEngine enables scalable conversion of human videos into robot data and, to our knowledge, demonstrates the first zero-shot visuomotor dexterous policy learning from egocentric human videos without real-robot demonstrations. Project website: https://egoengine.github.io.