🤖 AI Summary
Pretrained text-to-image models often suffer from detail loss, artifact generation, and difficulty balancing source image fidelity with textual alignment during semantic editing. To address this, we propose a Coupled Stochastic Differential Equation (SDE) framework that synchronizes the noise-driven sampling processes of source and target images. Our method enables plug-and-play editing—without model retraining or auxiliary networks—for SDE-based generators, including diffusion models and rectified flow models. The core innovation lies in leveraging inter-sample noise correlation to explicitly model cross-image semantic consistency, thereby significantly improving pixel-level alignment accuracy. Extensive experiments demonstrate that our approach achieves high visual fidelity, strong prompt adherence, and superior editing robustness across diverse generative models.
📝 Abstract
Editing the content of an image with a pretrained text-to-image model remains challenging. Existing methods often distort fine details or introduce unintended artifacts. We propose using coupled stochastic differential equations (coupled SDEs) to guide the sampling process of any pre-trained generative model that can be sampled by solving an SDE, including diffusion and rectified flow models. By driving both the source image and the edited image with the same correlated noise, our approach steers new samples toward the desired semantics while preserving visual similarity to the source. The method works out-of-the-box-without retraining or auxiliary networks-and achieves high prompt fidelity along with near-pixel-level consistency. These results position coupled SDEs as a simple yet powerful tool for controlled generative AI.