🤖 AI Summary
This work addresses the degradation of classifier-free guidance in class-conditional diffusion models under output-level distillation, which stems from the lack of supervision on the unconditional score branch. To remedy this, the authors propose a dual-branch distillation framework that, for the first time, applies independent supervision to both the conditional and unconditional score branches. The approach further incorporates an anchoring regularizer and a Temporal Importance-based Curriculum Transfer (TIRT) strategy to effectively preserve the distilled model’s guidance capability. Combined with DDIM sampling, the method achieves a 5.9× model compression on CIFAR-10/100 while incurring less than a 4-point FID increase over the teacher model at 50-step sampling—significantly outperforming models trained from scratch and demonstrating superior guidance fidelity.
📝 Abstract
Parameter compression of class-conditional diffusion models reveals an underexplored limitation in output-level distillation: the unconditional score branch remains unsupervised, leaving the classifier-free guidance gap underdetermined in the student. This gap, amplified at every denoising step, admits degenerate solutions where both branches collapse toward identical predictions, rendering guidance ineffective despite low output-level training loss. This paper introduces DASH, a dual-branch distillation framework that independently supervises both score branches, uniquely specifying target branch outputs for each training sample through independent branch constraints, with an anchor term regularising conditional predictions toward ground-truth noise. The framework further introduces TIRT Transfer, which copies the teacher's converged per-timestep importance curriculum into the student as a frozen prior, eliminating the need to relearn it within limited distillation budgets. Experiments on CIFAR-10 and CIFAR-100 demonstrate that 5.9x compression maintains quality within 4 FID points of the teacher at 50-step DDIM sampling, considerably outperforming training from scratch with guidance fidelity well preserved. Ablation studies confirm that unconditional supervision is the dominant contribution, accounting for over 60% of total distillation gain. Curriculum transfer and anchor regularisation provide complementary benefit, together validating dual-branch constraints as empirically essential for guidance-preserving compression.