🤖 AI Summary
Standard classifier-free guidance (CFG) suffers from systematic extrapolation during iterative denoising and inversion-based refinement, causing sampling trajectories to deviate from the data manifold and leading to error divergence—severely degrading both generation quality and control fidelity. To address this, we propose Manifold-Constrained Guidance Path Sampling (GPS), a novel paradigm that theoretically characterizes how CFG-induced extrapolation destabilizes sampling paths and proves GPS ensures strictly bounded cumulative error—contrasting with CFG’s unbounded error amplification. We further design a semantic-aware dynamic guidance strength scheduler to achieve spatiotemporal alignment between generation dynamics and semantic injection. GPS is architecture-agnostic and integrates seamlessly with主流 diffusion models including SDXL and Hunyuan-DiT. Experiments demonstrate state-of-the-art performance: ImageReward of 0.79 and HPS v2 of 0.2995 on SDXL, and GenEval semantic alignment accuracy improved to 57.45%, significantly outperforming existing methods.
📝 Abstract
Iterative refinement methods based on a denoising-inversion cycle are powerful tools for enhancing the quality and control of diffusion models. However, their effectiveness is critically limited when combined with standard Classifier-Free Guidance (CFG). We identify a fundamental limitation: CFG's extrapolative nature systematically pushes the sampling path off the data manifold, causing the approximation error to diverge and undermining the refinement process. To address this, we propose Guided Path Sampling (GPS), a new paradigm for iterative refinement. GPS replaces unstable extrapolation with a principled, manifold-constrained interpolation, ensuring the sampling path remains on the data manifold. We theoretically prove that this correction transforms the error series from unbounded amplification to strictly bounded, guaranteeing stability. Furthermore, we devise an optimal scheduling strategy that dynamically adjusts guidance strength, aligning semantic injection with the model's natural coarse-to-fine generation process. Extensive experiments on modern backbones like SDXL and Hunyuan-DiT show that GPS outperforms existing methods in both perceptual quality and complex prompt adherence. For instance, GPS achieves a superior ImageReward of 0.79 and HPS v2 of 0.2995 on SDXL, while improving overall semantic alignment accuracy on GenEval to 57.45%. Our work establishes that path stability is a prerequisite for effective iterative refinement, and GPS provides a robust framework to achieve it.