🤖 AI Summary
Diffusion models often generate images with hallucinations due to oversmoothed score functions, compromising output reliability. This work formally establishes, for the first time, a theoretical link between score smoothness and the probability mass of hallucinations from a data density perspective. To mitigate this issue, we propose Variance-guided Score Modulation (VSM), a strategy that reduces score oversmoothing by explicitly regulating the Jacobian of the score function to better approximate the true score. We introduce two benchmark datasets exhibiting extreme semantic variation to enable systematic evaluation. Experimental results demonstrate that VSM consistently reduces hallucinations by approximately 25% across multiple datasets while preserving high fidelity and diversity in generated images, thereby significantly enhancing the trustworthiness of diffusion models.
📝 Abstract
Diffusion models have emerged as the backbone of modern generative AI, powering advances in vision, language, audio and other modalities. Despite their success, they suffer from hallucinations, implausible samples that lie outside the support of true data distribution, which degrade reliability and trust. In this work, we first empirically confirm previously proposed hypothesis that score smoothness causes hallucinations in Image Generation diffusion models and provide a density-based perspective. We further formalize this notion by linking the hallucinations probability mass to lipschitz constant of the learned score function. Motivated by this, we introduce a Variance-Guided Score Modulation (VSM) strategy that controls the score Jacobian, in turn reducing score smoothness and better approximating the ground truth score that decreases hallucinations. Empirical results on synthetic and real-world datasets demonstrate that our approach reduces hallucinations (up to ~25%) while maintaining high fidelity and diversity, providing a principled step toward more reliable diffusion-based image generation. We also propose two benchmark datasets with extreme semantic variation for systematic hallucination evaluation. Code and Datasets are publicly available at https://github.com/bhosalems/VSM.