EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Learning dexterous robotic manipulation from a single first-person RGB-D video of human demonstration is highly challenging due to the absence of object pose, geometry, and contact information, and because existing approaches rely on pre-scanned object CAD assets. This work proposes the first single-shot dexterous manipulation learning framework that operates without requiring object CAD models. It recovers contact-consistent trajectories through asset-agnostic hand-object tracking and reconstruction, ego-motion compensation, and adaptive contact refinement. A two-stage residual policy learning scheme, augmented with online quality assessment, is then employed to generate effective robot policies. The method achieves strong performance in both simulation and real-world settings, matching the efficacy of CAD-based approaches on the HOI4D benchmark, and introduces EgoDex-R, a large-scale dataset comprising 4.3 million frames.

📝 Abstract

Egocentric RGB-D videos offer a natural source of human dexterous manipulation demonstrations, but existing data is difficult to use for robot learning because object pose, geometry, and contact information are often missing or require pre-scanned object assets. We present EgoAERO, the first framework that learns dexterous manipulation from a single egocentric RGB-D human demonstration without object assets. EgoAERO reconstructs contact-consistent hand-object trajectories through asset-free object tracking and reconstruction, ego motion compensation, and adaptive contact optimization, then converts them into robot policies using two-stage residual learning. We further introduce an online quality assessment mechanism and construct EgoDex-R, a large-scale egocentric dataset with 4.3M RGB-D frames for dexterous policy learning. Simulation and real-world experiments show that EgoAERO enables single-demonstration dexterous manipulation and achieves downstream performance close to CAD-based reconstructions on HOI4D.

Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation

egocentric video

object assets

robot learning

contact estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric video

asset-free reconstruction

dexterous manipulation