🤖 AI Summary
Conventional two-stage diffusion model training—first training a teacher diffusion model and then distilling it into a one-step student model—introduces strong dependency on the teacher, limiting efficiency and flexibility. Method: We propose Score-free Distillation, a novel framework that eliminates the need for teacher models entirely and enables direct end-to-end training of one-step diffusion generators. Crucially, we theoretically establish that explicit score function estimation—and thus score matching—is unnecessary. Through feature-space analysis, we identify that teacher-based weight initialization primarily transfers intermediate-layer feature representations, not input-output mappings or parameter priors; accordingly, we design an ablation-driven initialization mechanism. Contribution/Results: Our method achieves one-step generation performance competitive with teacher-based distillation baselines while drastically reducing reliance on pre-trained teacher models. It establishes a new paradigm for lightweight diffusion model training, offering improved scalability and reduced computational overhead.
📝 Abstract
Recent advances in one-step generative models typically follow a two-stage process: first training a teacher diffusion model and then distilling it into a one-step student model. This distillation process traditionally relies on both the teacher model's score function to compute the distillation loss and its weights for student initialization. In this paper, we explore whether one-step generative models can be trained directly without this distillation process. First, we show that the teacher's score function is not essential and propose a family of distillation methods that achieve competitive results without relying on score estimation. Next, we demonstrate that initialization from teacher weights is indispensable in successful training. Surprisingly, we find that this benefit is not due to improved ``input-output"mapping but rather the learned feature representations, which dominate distillation quality. Our findings provide a better understanding of the role of initialization in one-step model training and its impact on distillation quality.