🤖 AI Summary
Existing diffusion models suffer from low sampling efficiency, high memory overhead, and limited generation diversity in zero-shot image-to-image (I2I) translation. This paper proposes a training-free, fully black-box filtering guidance method: lightweight, adaptive filtering operations are applied at the input of each diffusion step, enabling model- and sampler-agnostic intervention. Key contributions include: (i) the first architecture- and sampler-agnostic universal filtering guidance; (ii) continuous, tunable guidance strength; and (iii) a novel, general interpretability perspective for self-attention mechanisms. Our method operates via gradient-free, iterative input reweighting—requiring no architectural modification or parameter optimization. Evaluated across multiple I2I tasks, it matches or surpasses task-specific state-of-the-art methods in structural fidelity while incurring negligible inference overhead.
📝 Abstract
Recent advances in diffusion-based generative models have shown incredible promise for Image-to-Image translation and editing. Most recent work in this space relies on additional training or architecture-specific adjustments to the diffusion process. In this work, we show that much of this low-level control can be achieved without additional training or any access to features of the diffusion model. Our method simply applies a filter to the input of each diffusion step based on the output of the previous step in an adaptive manner. Notably, this approach does not depend on any specific architecture or sampler and can be done without access to internal features of the network, making it easy to combine with other techniques, samplers, and diffusion architectures. Furthermore, it has negligible cost to performance, and allows for more continuous adjustment of guidance strength than other approaches. We show FGD offers a fast and strong baseline that is competitive with recent architecture-dependent approaches. Furthermore, FGD can also be used as a simple add-on to enhance the structural guidance of other state-of-the-art I2I methods. Finally, our derivation of this method helps to understand the impact of self attention, a key component of other recent architecture-specific I2I approaches, in a more architecture-independent way. Project page: https://github.com/jaclyngu/FilteredGuidedDiffusion