🤖 AI Summary
Flow-based robotic policies often suffer from representation collapse, hindering discrimination among similar visual states and undermining multimodal action modeling. To address this, we propose the first incorporation of dispersion regularization into the flow matching framework, extending MeanFlow with a novel regularized variant operating across multiple intermediate embedding spaces. Our method requires no auxiliary networks or complex training procedures, preserving one-step generation efficiency while significantly enhancing representation diversity. It supports end-to-end training and ultra-fast inference: on the RoboMimic benchmark, it achieves 20–40× speedup (0.07 s per inference) and improves task success rates by 10–20 percentage points (e.g., 99% for Lift). Furthermore, real-world validation on a Franka Panda robot demonstrates exceptional Sim2Real transfer capability.
📝 Abstract
The ability to learn multi-modal action distributions is indispensable for robotic manipulation policies to perform precise and robust control. Flow-based generative models have recently emerged as a promising solution to learning distributions of actions, offering one-step action generation and thus achieving much higher sampling efficiency compared to diffusion-based methods. However, existing flow-based policies suffer from representation collapse, the inability to distinguish similar visual representations, leading to failures in precise manipulation tasks. We propose DM1 (MeanFlow with Dispersive Regularization for One-Step Robotic Manipulation), a novel flow matching framework that integrates dispersive regularization into MeanFlow to prevent collapse while maintaining one-step efficiency. DM1 employs multiple dispersive regularization variants across different intermediate embedding layers, encouraging diverse representations across training batches without introducing additional network modules or specialized training procedures. Experiments on RoboMimic benchmarks show that DM1 achieves 20-40 times faster inference (0.07s vs. 2-3.5s) and improves success rates by 10-20 percentage points, with the Lift task reaching 99% success over 85% of the baseline. Real-robot deployment on a Franka Panda further validates that DM1 transfers effectively from simulation to the physical world. To the best of our knowledge, this is the first work to leverage representation regularization to enable flow-based policies to achieve strong performance in robotic manipulation, establishing a simple yet powerful approach for efficient and robust manipulation.