🤖 AI Summary
This work addresses the challenge that pretrained bridging models in image translation often fail to effectively leverage prior information. To overcome this limitation without requiring retraining, the authors propose a training-free prior-guidance mechanism that enhances prior utilization by introducing unseen weak priors and contrasting them with observed priors. This approach is integrated with a frequency modulation strategy—Frequency Modulation with Prior Guidance (FMPG)—which adaptively scales features in the frequency domain. The study establishes a novel training-free paradigm for prior-guided image translation, elucidates the underlying mechanisms of prior influence during bridging, and constructs a cascaded CFG-FMPG framework. Experimental results demonstrate consistent improvements in both generation quality and inference efficiency across diverse image translation tasks.
📝 Abstract
Guidance methods, such as classifier-free guidance (CFG) and auto-guidance (AG), have advanced noise-to-data generation in diffusion models. Recently, bridge models have introduced a data-to-data generative process that can exploit an instructive clean prior. In this work, inspired by previous methods creating quality difference between denoising results as guidance, we propose a training-free bridge guidance method, termed Prior Guidance (PG). Specifically, we introduce a weak prior, which is unseen during bridge pre-training, hindering prior exploitation and thereby degrading denoising result. Then, we contrast it with the seen prior to highlight and enhance prior exploitation via a scaling factor. Moreover, we analyze the underlying mechanism of prior exploitation in the bridge process and design frequency-modulated prior guidance (FMPG), which tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. To address prior exploitation in image in-painting, we develop a cascaded framework, CFG-FMPG, which first generates a noisy hidden representation via CFG and then exploits it as a generative prior with FMPG, fulfilling their complementary strengths without compromising inference efficiency. Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks.