Learning Long-Horizon Robot Manipulation Skills via Privileged Action

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-horizon, contact-intensive robotic manipulation tasks suffer from poor convergence, susceptibility to local optima, and heavy reliance on hand-crafted reward functions or expert demonstrations—especially under high-dimensional state spaces and sparse rewards. Method: We propose a privileged-action-driven curriculum learning framework. In simulation, we introduce privileged actions—including virtual forces and constraint relaxation—to guide exploration without task-specific reward engineering or reference trajectories, enabling multi-stage coordinated manipulation (e.g., non-pregrasp → grasping → manipulation) starting from arbitrary initial poses. The policy jointly learns control and privileged information via reinforcement learning. Contribution/Results: Our approach efficiently explores complex contact dynamics and achieves high-fidelity sim-to-real transfer on physical robots. Experiments demonstrate significant improvements over state-of-the-art baselines across diverse long-horizon tasks, with strong generalization and robustness to environmental variations.

Technology Category

Application Category

📝 Abstract
Long-horizon contact-rich tasks are challenging to learn with reinforcement learning, due to ineffective exploration of high-dimensional state spaces with sparse rewards. The learning process often gets stuck in local optimum and demands task-specific reward fine-tuning for complex scenarios. In this work, we propose a structured framework that leverages privileged actions with curriculum learning, enabling the policy to efficiently acquire long-horizon skills without relying on extensive reward engineering or reference trajectories. Specifically, we use privileged actions in simulation with a general training procedure that would be infeasible to implement in real-world scenarios. These privileges include relaxed constraints and virtual forces that enhance interaction and exploration with objects. Our results successfully achieve complex multi-stage long-horizon tasks that naturally combine non-prehensile manipulation with grasping to lift objects from non-graspable poses. We demonstrate generality by maintaining a parsimonious reward structure and showing convergence to diverse and robust behaviors across various environments. Additionally, real-world experiments further confirm that the skills acquired using our approach are transferable to real-world environments, exhibiting robust and intricate performance. Our approach outperforms state-of-the-art methods in these tasks, converging to solutions where others fail.
Problem

Research questions and friction points this paper is trying to address.

Learning long-horizon robot manipulation skills efficiently
Overcoming sparse rewards in high-dimensional state spaces
Transferring simulation-trained skills to real-world environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Privileged actions enhance exploration
Curriculum learning optimizes skill acquisition
Transferable skills to real-world environments
🔎 Similar Papers
No similar papers found.
Xiaofeng Mao
Xiaofeng Mao
Alibaba Group
Computer VisionAdversarial Machine Learning
Y
Yucheng Xu
University of Edinburgh
Z
Zhaole Sun
University of Edinburgh
E
Elle Miller
University of Edinburgh
D
Daniel Layeghi
University of Edinburgh
Michael Mistry
Michael Mistry
Professor of Robotics, University of Edinburgh
RoboticsHuman Motor Control