Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
This work investigates distribution matching distillation (DMD) and reveals that high-dimensional student models tend to directly replicate the teacher’s noise–data pairings rather than exploiting the theoretically available freedom in noise remapping. Through a combination of high-dimensional geometric analysis and empirical experiments, the study demonstrates for the first time that this “copying” behavior is an emergent phenomenon caused by the limited geometric degrees of freedom inherent to high-dimensional student architectures, rather than stemming from adversarial training dynamics or memorization effects. By systematically comparing student behavior across low- and high-dimensional settings, the authors establish that the copying phenomenon has a structural origin in high dimensions. These findings deepen the understanding of distillation mechanisms in diffusion models and provide a theoretical foundation for designing more flexible student architectures.
📝 Abstract
Distribution Matching Distillation (DMD) compresses pretrained diffusion models into efficient few-step generators by aligning their noised distributions across all scales. In principle, such distribution-level supervision remains agnostic to specific noise-data pairings of the teacher; this provides the student the freedom to remap latent noise, a behavior consistently observed in low-dimensional settings. Surprisingly, we find that in high-dimensional settings, distilled students spontaneously reproduce the original noise-data pairings of the teacher, a phenomenon we term copying. We demonstrate that copying is neither a byproduct of adversarial objectives nor a result of teacher memorization. Instead, our evidence suggests that copying is an emergent property arising from the limited geometric freedom of the student model during high-dimensional distillation.
Problem

Research questions and friction points this paper is trying to address.

Distribution Matching Distillation
copying behavior
few-step distillation
high-dimensional distillation
noise-data pairing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distribution Matching Distillation
copying behavior
few-step distillation
high-dimensional emergence
noise-data pairing
🔎 Similar Papers
No similar papers found.