🤖 AI Summary
Existing flow-based image editing methods struggle to balance fidelity and controllability, often hindered by high computational costs, inversion truncation, or reliance on specific model architectures. This work proposes a model-agnostic editing framework built upon rectified flows, featuring an amortized fixed-point solver that enables high-fidelity inversion. By integrating trajectory interpolation, adaptive masking, and velocity consistency constraints, the method precisely governs the editing process. Evaluated on FLUX.1-dev and Stable Diffusion 3.5 Medium, it significantly outperforms current approaches—better preserving source structure and background while enabling drift-free, multi-step complex edits. The framework offers both theoretical guarantees and practical extensibility.
📝 Abstract
Recent advances in flow-based generative models have enabled training-free, text-guided image editing by inverting an image into its latent noise and regenerating it under a new target conditional guidance. However, existing methods struggle to preserve source fidelity: higher-order solvers incur additional model inferences, truncated inversion constrains editability, and feature injection methods lack architectural transferability. To address these limitations, we propose SteerFlow, a model-agnostic editing framework with strong theoretical guarantees on source fidelity. In the forward process, we introduce an Amortized Fixed-Point Solver that implicitly straightens the forward trajectory by enforcing velocity consistency across consecutive timesteps, yielding a high-fidelity inverted latent. In the backward process, we introduce Trajectory Interpolation, which adaptively blends target-editing and source-reconstruction velocities to keep the editing trajectory anchored to the source. To further improve background preservation, we introduce an Adaptive Masking mechanism that spatially constrains the editing signal with concept-guided segmentation and source-target velocity differences. Extensive experiments on FLUX.1-dev and Stable Diffusion 3.5 Medium demonstrate that SteerFlow consistently achieves better editing quality than existing methods. Finally, we show that SteerFlow extends naturally to a complex multi-turn editing paradigm without accumulating drift.