Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Generative models often suffer from mode collapse due to Gaussian initialization that disregards the guidance potential distribution, thereby limiting output diversity. This work proposes DivIn, a novel approach that, for the first time, incorporates the guidance potential posterior directly during initialization. By leveraging Langevin dynamics for efficient sampling and reweighting the prior to emphasize regions rich in diversity, DivIn operates solely at inference time and is compatible with both diffusion and flow-matching models. Moreover, it can be orthogonally combined with existing trajectory optimization techniques. Experimental results demonstrate that DivIn significantly enhances generation diversity in class-to-image and text-to-image tasks, while further advancing the Pareto frontier between diversity and sample quality.

📝 Abstract

Despite the remarkable fidelity of generative models, they frequently suffer from mode collapse. Existing strategies for enhancing diversity predominantly focus on intervening during the generation trajectory. We identify a critical oversight that the standard Gaussian initialization often causes trajectories to collapse into dominant modes because it is agnostic to the guidance potential landscape. In this work, we formulate selecting the initial noise from a guidance potential posterior, which effectively re-weights the prior towards diversity-rich regions. To sample from this distribution efficiently, we introduce Diversity-inducing Initialization (DivIn), which leverages Langevin dynamics to actively navigate the initialization landscape, steering initial noise away from collapsing regions while anchoring them to the valid data manifold. Our method serves as an inference-time diversity enhancement compatible with both diffusion and flow matching models. Extensive experiments show that DivIn exhibits a superior performance in both class-to-image and text-to-image scenarios. Furthermore, we highlight that as DivIn is orthogonal to trajectory-based methods, combining them significantly expands the diversity-quality Pareto frontier beyond what either achieves in isolation.

Problem

Research questions and friction points this paper is trying to address.

mode collapse

diversity

initialization

generative models

guidance potential

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversity-inducing Initialization

guidance potential posterior

Langevin dynamics