EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots struggle to co-adapt to diverse scenarios, tasks, and user intents due to the lack of scalable whole-body motion priors. This work proposes EgoPriMo, a unified framework that leverages first-person human demonstrations to jointly model SMPL body dynamics, egocentric vision, and textual instructions through a three-stream DiT architecture. A novel task-conditioned masking mechanism enables consistent inference across multiple tasks, even with missing modalities. Trained as a single model, EgoPriMo supports motion reconstruction, generation, and prediction, outperforming UniEgoMotion on both Nymeria and EgoExo4D benchmarks. The generated motions are directly deployable on Unitree humanoid robots, demonstrating practical applicability in real-world settings.
📝 Abstract
Humanoid robots require whole-body motions that adapt to scene context, task requirements, and user intent. Motion tracking reproduces specified trajectories, and humanoid vision-language-action systems provide semantic interfaces, but neither offers a scalable and interactive prior for broad full-body behavior. We introduce EgoPriMo (Egocentric Motion Prior for Humanoid Robots), a unified framework that learns such priors from egocentric human demonstrations. Given egocentric observations and a text prompt, EgoPriMo reconstructs, generates, and forecasts SMPL-based full-body motion. Language is used as a high-level control signal rather than a complete motion specification. At the core of EgoPriMo is a Triple-stream DiT that jointly models body dynamics, egocentric visual context, and text; task-conditioning masks route different tasks and missing-modality data through the same checkpoint. Experiments on Nymeria and EgoExo4D show that one checkpoint improves egocentric motion generation over UniEgoMotion while supporting reconstruction and forecasting; the generated SMPL motions can also be executed by a Unitree humanoid controller. These results indicate a practical path from scalable egocentric observations to generalizable and interactive humanoid motion priors.
Problem

Research questions and friction points this paper is trying to address.

humanoid robots
motion prior
egocentric vision
interactive control
full-body motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Egocentric motion prior
Triple-stream DiT
SMPL-based motion generation
Task-conditioning masks
Humanoid robot control
🔎 Similar Papers
No similar papers found.