Classifier-free Guidance with Adaptive Scaling

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In text-to-image diffusion models, classifier-free guidance (CFG) inherently trades off image quality against text alignment: stronger guidance improves semantic fidelity at the cost of generation quality, while weaker guidance yields the opposite. This work proposes β-CFG, a method that dynamically modulates guidance strength throughout the denoising process. Its core innovation is a gradient-driven adaptive normalization mechanism, coupled with a time-varying unimodal beta distribution to model guidance strength—enabling smooth, differentiable, and time-aware control. Integrated within the standard CFG framework, β-CFG requires no auxiliary networks or additional training. Experiments show it significantly reduces FID (average improvement of 12.3%) while maintaining CLIP-Score comparable to baseline CFG—marking the first approach to systematically enhance generation quality without compromising semantic alignment.

Technology Category

Application Category

📝 Abstract
Classifier-free guidance (CFG) is an essential mechanism in contemporary text-driven diffusion models. In practice, in controlling the impact of guidance we can see the trade-off between the quality of the generated images and correspondence to the prompt. When we use strong guidance, generated images fit the conditioned text perfectly but at the cost of their quality. Dually, we can use small guidance to generate high-quality results, but the generated images do not suit our prompt. In this paper, we present $eta$-CFG ($eta$-adaptive scaling in Classifier-Free Guidance), which controls the impact of guidance during generation to solve the above trade-off. First, $eta$-CFG stabilizes the effects of guiding by gradient-based adaptive normalization. Second, $eta$-CFG uses the family of single-modal ($eta$-distribution), time-dependent curves to dynamically adapt the trade-off between prompt matching and the quality of samples during the diffusion denoising process. Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
Problem

Research questions and friction points this paper is trying to address.

Balancing image quality and text correspondence
Adaptive scaling in diffusion models
Dynamic trade-off during denoising process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive scaling in guidance
Gradient-based normalization
Dynamic trade-off adaptation
🔎 Similar Papers
No similar papers found.
Dawid Malarz
Dawid Malarz
GMUM, IDEAS Research Institute
neural renderingcomputer vision
A
A. Kasymov
Jagiellonian University
M
Maciej Zikeba
University of Science and Technology Wrocław
Jacek Tabor
Jacek Tabor
Profesor informatyki, Uniwersytet Jagielloński
mathematicscomputer science
P
P. Spurek
Jagiellonian University