EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Humanoid robots struggle to co-adapt to diverse scenarios, tasks, and user intents due to the lack of scalable whole-body motion priors. This work proposes EgoPriMo, a unified framework that leverages first-person human demonstrations to jointly model SMPL body dynamics, egocentric vision, and textual instructions through a three-stream DiT architecture. A novel task-conditioned masking mechanism enables consistent inference across multiple tasks, even with missing modalities. Trained as a single model, EgoPriMo supports motion reconstruction, generation, and prediction, outperforming UniEgoMotion on both Nymeria and EgoExo4D benchmarks. The generated motions are directly deployable on Unitree humanoid robots, demonstrating practical applicability in real-world settings.

📝 Abstract

Humanoid robots require whole-body motions that adapt to scene context, task requirements, and user intent. Motion tracking reproduces specified trajectories, and humanoid vision-language-action systems provide semantic interfaces, but neither offers a scalable and interactive prior for broad full-body behavior. We introduce EgoPriMo (Egocentric Motion Prior for Humanoid Robots), a unified framework that learns such priors from egocentric human demonstrations. Given egocentric observations and a text prompt, EgoPriMo reconstructs, generates, and forecasts SMPL-based full-body motion. Language is used as a high-level control signal rather than a complete motion specification. At the core of EgoPriMo is a Triple-stream DiT that jointly models body dynamics, egocentric visual context, and text; task-conditioning masks route different tasks and missing-modality data through the same checkpoint. Experiments on Nymeria and EgoExo4D show that one checkpoint improves egocentric motion generation over UniEgoMotion while supporting reconstruction and forecasting; the generated SMPL motions can also be executed by a Unitree humanoid controller. These results indicate a practical path from scalable egocentric observations to generalizable and interactive humanoid motion priors.

Problem

Research questions and friction points this paper is trying to address.

humanoid robots

motion prior

egocentric vision

interactive control

full-body motion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Egocentric motion prior

Triple-stream DiT

SMPL-based motion generation

Task-conditioning masks

Humanoid robot control

🔎 Similar Papers

No similar papers found.