Conditioning Matters: Training Diffusion Policies is Faster Than You Think

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This paper addresses the “loss collapse” problem in conditional diffusion policy training—where ambiguous conditional semantics cause the optimization objective to degenerate into modeling the marginal action distribution. To resolve this, we propose Cocos, a method that integrates flow matching with semantic anchoring via conditional-dependent source distribution reparameterization, thereby enforcing strong conditional alignment. Cocos is the first work to formally characterize and name this phenomenon; it abandons the conventional fixed Gaussian source distribution in favor of a dynamic, language-vision-guided source distribution. Within a unified vision-language-action framework, Cocos achieves significantly accelerated convergence—reducing required gradient steps by 3–5×—while improving success rates on both simulation and real-robot tasks. Notably, its performance matches that of large-scale VLA models despite orders-of-magnitude fewer parameters.

Technology Category

Application Category

📝 Abstract

Diffusion policies have emerged as a mainstream paradigm for building vision-language-action (VLA) models. Although they demonstrate strong robot control capabilities, their training efficiency remains suboptimal. In this work, we identify a fundamental challenge in conditional diffusion policy training: when generative conditions are hard to distinguish, the training objective degenerates into modeling the marginal action distribution, a phenomenon we term loss collapse. To overcome this, we propose Cocos, a simple yet general solution that modifies the source distribution in the conditional flow matching to be condition-dependent. By anchoring the source distribution around semantics extracted from condition inputs, Cocos encourages stronger condition integration and prevents the loss collapse. We provide theoretical justification and extensive empirical results across simulation and real-world benchmarks. Our method achieves faster convergence and higher success rates than existing approaches, matching the performance of large-scale pre-trained VLAs using significantly fewer gradient steps and parameters. Cocos is lightweight, easy to implement, and compatible with diverse policy architectures, offering a general-purpose improvement to diffusion policy training.

Problem

Research questions and friction points this paper is trying to address.

Improving training efficiency of diffusion policies for robot control

Addressing loss collapse in conditional diffusion policy training

Enhancing condition integration to prevent marginal action modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Condition-dependent source distribution modification

Anchoring source distribution around condition semantics

Lightweight, compatible with diverse policy architectures

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training