🤖 AI Summary
Diffusion model sampling suffers from inconsistent output quality; existing quality-enhancement methods rely on costly retraining or external reward signals. This paper proposes the first plug-and-play early sample filtering framework that requires neither model modification nor retraining, and operates without auxiliary reward signals. Our key insight is the discovery—first reported here—that under classifier-free guidance (CFG), the accumulated difference between conditional and unconditional score trajectories (ASD) exhibits a strong correlation with sample quality. Leveraging this finding, we develop an ASD modeling mechanism coupled with a dynamic early-rejection strategy to identify and terminate low-quality sampling trajectories during inference. Evaluated on GenEval and DPG-Bench, our method achieves state-of-the-art performance, significantly improving HPSv2 (+3.2) and PickScore (+4.7) while incurring negligible computational overhead.
📝 Abstract
Diffusion models often exhibit inconsistent sample quality due to stochastic variations inherent in their sampling trajectories. Although training-based fine-tuning (e.g. DDPO [1]) and inference-time alignment techniques[2] aim to improve sample fidelity, they typically necessitate full denoising processes and external reward signals. This incurs substantial computational costs, hindering their broader applicability. In this work, we unveil an intriguing phenomenon: a previously unobserved yet exploitable link between sample quality and characteristics of the denoising trajectory during classifier-free guidance (CFG). Specifically, we identify a strong correlation between high-density regions of the sample distribution and the Accumulated Score Differences (ASD)--the cumulative divergence between conditional and unconditional scores. Leveraging this insight, we introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process, crucially without requiring external reward signals or model retraining. Importantly, our approach necessitates no modifications to model architectures or sampling schedules and maintains full compatibility with existing diffusion frameworks. We validate the effectiveness of CFG-Rejection in image generation through extensive experiments, demonstrating marked improvements on human preference scores (HPSv2, PickScore) and challenging benchmarks (GenEval, DPG-Bench). We anticipate that CFG-Rejection will offer significant advantages for diverse generative modalities beyond images, paving the way for more efficient and reliable high-quality sample generation.