DexPIE: Stable Dexterous Policy Improvement from Real-World Experience

πŸ“… 2026-06-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges in imitation learning for dexterous manipulation, where high-dimensional action spaces and complex contact dynamics lead to error accumulation and heavy reliance on large expert datasets. To overcome these limitations, the authors propose DexPIE, a framework that integrates multi-stage DAgger-style data collection, a hand-adapted intervention mechanism, asynchronous inference in a relative action space, and a conditional policy guided by a continuous optimality metric. This approach enables fine-grained data utilization and stable policy improvement. Evaluated on three real-world dexterous manipulation tasks, DexPIE achieves a 37% higher success rate than the demonstration policy and significantly outperforms existing baselines, demonstrating superior robustness.
πŸ“ Abstract
Dexterous manipulation presents substantial challenges for imitation learning due to its high-dimensional action space and complex contact-rich dynamics. Policies trained purely from demonstrations often suffer from compounding errors during deployment and require large amounts of expert data to achieve reliable performance. To move beyond the limitations of demonstration data, in this work, we propose DexPIE, a post-training framework for dexterous policy improvement from experience collected through real-world deployment. First, DexPIE enables effective exploration coverage through a dexterous-hand-adapted intervention system and multi-stage DAgger-style data collection across initial and intermediate task stages, providing reliable supervision for accurate policy evaluation. To reduce temporal noise between post-training rollouts and demonstration data, we introduce asynchronous inference in the relative action space, which better aligns rollout data with demonstrated behavior and allows the critic to learn a value function induced by a more consistent underlying policy. Finally, DexPIE improves the policy through conditioning on a continuous optimality indicator, allowing the policy to leverage the quality of data in a more fine-grained manner. Across three challenging real-world dexterous manipulation tasks, DexPIE achieves a 37% improvement in success rate over the demonstration-based reference policy, outperforming all baseline methods and demonstrating stronger robustness. The source code and dataset will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation
imitation learning
compounding errors
high-dimensional action space
contact-rich dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

dexterous manipulation
policy improvement
real-world experience
asynchronous inference
optimality conditioning
πŸ”Ž Similar Papers
No similar papers found.