EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets

📅 2026-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Learning dexterous robotic manipulation from a single first-person RGB-D video of human demonstration is highly challenging due to the absence of object pose, geometry, and contact information, and because existing approaches rely on pre-scanned object CAD assets. This work proposes the first single-shot dexterous manipulation learning framework that operates without requiring object CAD models. It recovers contact-consistent trajectories through asset-agnostic hand-object tracking and reconstruction, ego-motion compensation, and adaptive contact refinement. A two-stage residual policy learning scheme, augmented with online quality assessment, is then employed to generate effective robot policies. The method achieves strong performance in both simulation and real-world settings, matching the efficacy of CAD-based approaches on the HOI4D benchmark, and introduces EgoDex-R, a large-scale dataset comprising 4.3 million frames.
📝 Abstract
Egocentric RGB-D videos offer a natural source of human dexterous manipulation demonstrations, but existing data is difficult to use for robot learning because object pose, geometry, and contact information are often missing or require pre-scanned object assets. We present EgoAERO, the first framework that learns dexterous manipulation from a single egocentric RGB-D human demonstration without object assets. EgoAERO reconstructs contact-consistent hand-object trajectories through asset-free object tracking and reconstruction, ego motion compensation, and adaptive contact optimization, then converts them into robot policies using two-stage residual learning. We further introduce an online quality assessment mechanism and construct EgoDex-R, a large-scale egocentric dataset with 4.3M RGB-D frames for dexterous policy learning. Simulation and real-world experiments show that EgoAERO enables single-demonstration dexterous manipulation and achieves downstream performance close to CAD-based reconstructions on HOI4D.
Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation
egocentric video
object assets
robot learning
contact estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric video
asset-free reconstruction
dexterous manipulation
contact-consistent tracking
residual policy learning
Y
Yichen Niu
School of Astronautics, Harbin Institute of Technology
H
Haoran Lv
School of Astronautics, Harbin Institute of Technology
X
Xinrui Zhang
School of Astronautics, Harbin Institute of Technology
X
Xueyao Wan
School of Astronautics, Harbin Institute of Technology
S
Shiyu Gao
School of Astronautics, Harbin Institute of Technology
Y
Ying Ai
School of Astronautics, Harbin Institute of Technology
H
Hui Xu
School of Astronautics, Harbin Institute of Technology
Y
Yongqi Hu
School of Astronautics, Harbin Institute of Technology
H
Hengyi Zhang
Suzhou Research Institute, Harbin Institute of Technology
Yang Xie
Yang Xie
Professor, UT Southwestern Medical Center
Statistical GenomicsPredictive ModelingPrecision Medicine
Z
Zhaxizhuoma
Shanghai Jiao Tong University
Y
Yue Zhao
School of Astronautics, Harbin Institute of Technology
Zhenshan Bing
Zhenshan Bing
Nanjing University / Technical University of Munich
Robotics
Y
Yan Ding
Lumos Robotic
Jianxing Liu
Jianxing Liu
Control Science and Engineering
Control theory and application