SAM-Flow: Source-Anchored Masked Flow for Training-Free Image Editing

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
This work addresses the issue of background leakage and unintended modifications in existing training-free image editing methods that rely on global latent space transport. To mitigate this, the authors propose a Source-Anchored Masked Flow framework that identifies editable regions using a reference image and token-grounded semantic attention maps, applying differential velocity updates only within these regions while anchoring the latent trajectories of non-target areas to the source image. A time-varying source-anchoring projection mechanism is introduced, integrating dynamic soft masks, transition-region optimization, and temporal mask accumulation to substantially enhance spatial stability and boundary naturalness. The method is plug-and-play, requiring no fine-tuning, and seamlessly adapts to mainstream flow-matching models. It achieves precise local edits while preserving background integrity, outperforming current approaches both qualitatively and quantitatively, thereby establishing a general, training-free paradigm for localized image editing.
📝 Abstract
Training-free image editing has recently attracted increasing attention due to its ability to modify real images using powerful pre-trained diffusion and flow-matching models without additional training. However, existing inversion-based and differential-flow-based methods usually perform global latent transport, which inevitably propagates editing effects to non-target regions and leads to background leakage. To address this problem, we propose SAM-Flow, a source-anchored masked flow framework for localized training-free image editing. Instead of updating the whole latent representation, SAM-Flow first uses a scout image and token-grounded attention maps to localize the editable semantic regions. It then applies differential velocity updates only within these regions, while anchoring the remaining areas to the source-image latent trajectory. To further improve spatial stability and boundary naturalness, we introduce a time-varying source-anchored projection mechanism with dynamic soft masks, transition regions, and temporal mask accumulation. The proposed method is plug-and-play and can be integrated with mainstream flow-matching backbones such as Stable Diffusion 3 and FLUX without any fine-tuning. Extensive qualitative and quantitative experiments demonstrate that SAM-Flow achieves accurate semantic editing while significantly improving background preservation, providing a simple and general localized editing paradigm for training-free image editing. Code is available at: https://github.com/chwbob/Sam-Flow.
Problem

Research questions and friction points this paper is trying to address.

training-free image editing
background leakage
localized editing
latent transport
semantic regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free image editing
source-anchored flow
masked latent transport
localized editing
flow-matching models
H
Haowang Cui
Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, School of Microelectronics, Tianjin University, Tianjin 300072, China
R
Rui Chen
Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, School of Microelectronics, Tianjin University, Tianjin 300072, China
T
Tao Luo
School of Cyber Security, Tianjin University, Tianjin 300072, China
T
Tao Guo
Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, School of Microelectronics, Tianjin University, Tianjin 300072, China
Zheng Qin
Zheng Qin
National University of Defense Technology
Computer VisionDeep Learning
J
Jiaze Wang
Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, School of Microelectronics, Tianjin University, Tianjin 300072, China