ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Acquiring expert demonstration data on real robots is prohibitively expensive, limiting the generalization and robustness of behavior cloning. To address this challenge, this work proposes ExpertGen, a framework that leverages imperfect priors—such as human or large language model demonstrations—in simulation to enable efficient and safe policy transfer. By freezing a pretrained diffusion policy and optimizing only its initial noise, ExpertGen integrates diffusion models, reinforcement learning, and DAgger to significantly improve task success rates under sparse rewards without requiring reward engineering, while preserving behavioral similarity to human demonstrations. Experimental results demonstrate that the method achieves success rates of 90.5% in industrial assembly tasks and 85% in long-horizon manipulation tasks, substantially outperforming baseline approaches, and successfully transfers to real-world robotic deployment.

Technology Category

Application Category

📝 Abstract
Learning generalizable and robust behavior cloning policies requires large volumes of high-quality robotics data. While human demonstrations (e.g., through teleoperation) serve as the standard source for expert behaviors, acquiring such data at scale in the real world is prohibitively expensive. This paper introduces ExpertGen, a framework that automates expert policy learning in simulation to enable scalable sim-to-real transfer. ExpertGen first initializes a behavior prior using a diffusion policy trained on imperfect demonstrations, which may be synthesized by large language models or provided by humans. Reinforcement learning is then used to steer this prior toward high task success by optimizing the diffusion model's initial noise while keep original policy frozen. By keeping the pretrained diffusion policy frozen, ExpertGen regularizes exploration to remain within safe, human-like behavior manifolds, while also enabling effective learning with only sparse rewards. Empirical evaluations on challenging manipulation benchmarks demonstrate that ExpertGen reliably produces high-quality expert policies with no reward engineering. On industrial assembly tasks, ExpertGen achieves a 90.5% overall success rate, while on long-horizon manipulation tasks it attains 85% overall success, outperforming all baseline methods. The resulting policies exhibit dexterous control and remain robust across diverse initial configurations and failure states. To validate sim-to-real transfer, the learned state-based expert policies are further distilled into visuomotor policies via DAgger and successfully deployed on real robotic hardware.
Problem

Research questions and friction points this paper is trying to address.

sim-to-real
behavior cloning
expert policy
robotics
demonstration data
Innovation

Methods, ideas, or system contributions that make the work stand out.

ExpertGen
diffusion policy
sim-to-real transfer
behavior cloning
reinforcement learning
🔎 Similar Papers
No similar papers found.