🤖 AI Summary
This work addresses the challenge that diffusion-based policies often fail to discover rare yet effective behavioral patterns under scarce demonstration data, frequently converging to suboptimal solutions or generating infeasible trajectories. To overcome this limitation, the authors propose a novel framework integrating a Feynman–Kac corrector with a learnable guidance potential, which systematically steers the diffusion process toward under-explored feasible regions of the trajectory space. By coupling this guided exploration with sample-based trajectory optimization and an iterative retraining mechanism, the method continuously refines and reuses newly discovered behaviors. Empirical results demonstrate that the approach substantially enhances policy diversity, feasibility, and generalization, consistently uncovering novel and effective strategies across diverse manipulation tasks—surpassing the capabilities of conventional sampling and reinforcement learning methods.
📝 Abstract
Diffusion models have become a powerful tool for generative modeling in robotics, with diffusion policies excelling at modeling multimodal action-trajectory distributions. However, when demonstrations are limited, standard sampling often reproduces dominant behaviors while neglecting valid but rare modes, limiting the discovery of novel solutions. Existing approaches, such as guidance methods or combining reinforcement learning with diffusion, either push samples into infeasible regions or struggle to escape local minima, failing to systematically uncover diverse behaviors. To address these challenges, we propose a framework that combines Feynman-Kac correctors with a novel guiding potential that systematically guides diffusion policy samples towards promising yet underrepresented samples. These trajectories are refined using sampling-based trajectory optimization and reincorporated into the training set to retrain the diffusion policy. Our method effectively mines and repairs novel trajectories, enabling the systematic discovery of diverse and executable behaviors. We demonstrate the effectiveness of our framework across a range of manipulation environments, consistently discovering new behaviors.