🤖 AI Summary
Diffusion models often generate structurally inconsistent hallucinated samples due to excessive inter-modal smoothing, violating the support of the true data distribution. To address this, we propose a dynamic guidance method operating during the generative sampling phase: it identifies artifact-sensitive directions via directional score function analysis and selectively sharpens gradients along those directions to suppress hallucinations while preserving semantic interpolation validity and diversity. Unlike prior approaches, our method directly regulates hallucination *during* forward sampling—without requiring post-hoc filtering—and introduces a direction-selective mechanism for fine-grained control over inter-modal smoothing behavior. Evaluated on both controlled and natural image datasets, our approach significantly reduces structural hallucinations, improves structural consistency and visual fidelity of generated samples, and outperforms mainstream baseline methods.
📝 Abstract
Diffusion models, despite their impressive demos, often produce hallucinatory samples with structural inconsistencies that lie outside of the support of the true data distribution. Such hallucinations can be attributed to excessive smoothing between modes of the data distribution. However, semantic interpolations are often desirable and can lead to generation diversity, thus we believe a more nuanced solution is required. In this work, we introduce Dynamic Guidance, which tackles this issue. Dynamic Guidance mitigates hallucinations by selectively sharpening the score function only along the pre-determined directions known to cause artifacts, while preserving valid semantic variations. To our knowledge, this is the first approach that addresses hallucinations at generation time rather than through post-hoc filtering. Dynamic Guidance substantially reduces hallucinations on both controlled and natural image datasets, significantly outperforming baselines.