🤖 AI Summary
Diffusion-based policies in robotic manipulation often struggle to simultaneously satisfy physical safety constraints and achieve high task performance. Existing approaches either impose constraints too early, limiting policy expressiveness, or rely on external safeguards during deployment, hindering scalability. This work proposes PACT, a framework that aligns pretrained diffusion policies with physical constraints through a self-evolving post-training projection mechanism, without requiring demonstration data or task-specific rewards. By integrating reverse KL optimization, constraint-aware gradient distillation, and progressive curriculum learning, PACT ensures theoretically grounded monotonic policy improvement. Experiments demonstrate that PACT reduces safety violations by 31.0% on average and improves task success rates by 30.7% across both simulated and real-world manipulation tasks.
📝 Abstract
Diffusion policies have achieved remarkable success in robotic manipulation, yet they often fail to satisfy strict physical constraints required for safe deployment. Existing approaches impose safety either prematurely during training or reactively via external guardrails at test time, limiting policy expressivity and overall scalability. We propose Physical safety Alignment for Constrained Trajectories (PACT), a self-evolving post-training framework that projects pretrained diffusion policies onto constraint-feasible regions without accessing demonstration data or task rewards. PACT distills constraint gradients into the diffusion model through a reverse-KL objective with dense supervision across timesteps. It incorporates a curriculum that progressively tightens constraints while maintaining theoretically bounded policy shift and monotone improvement, mitigating the safety-performance trade-off from catastrophic forgetting. On simulated and real-world embodied manipulation benchmarks, PACT significantly reduces safety violations by 31.0% on average while improving task success by 30.7%.