ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of structural preservation and weak text-image alignment in Rectified Flow (ReFlow)-based text-to-image models for real-image editing. We propose a training-free, mask-free, and source-prompt-free editing method. Our approach first performs mid-layer latent-space inversion to extract multimodal intermediate features, then introduces an attention-adaptive injection mechanism that dynamically refines cross-modal attention weights to jointly enhance structural consistency and semantic alignment. To our knowledge, this is the first zero-shot real-image editing method within the ReFlow framework—requiring neither fine-tuning nor user interaction. Extensive experiments on two standard benchmarks demonstrate significant improvements over nine state-of-the-art baselines. Human evaluation further confirms superior perceptual quality and controllability, validating the method’s dual advantages in fidelity and edit precision.

Technology Category

Application Category

📝 Abstract
Rectified Flow text-to-image models surpass diffusion models in image quality and text alignment, but adapting ReFlow for real-image editing remains challenging. We propose a new real-image editing method for ReFlow by analyzing the intermediate representations of multimodal transformer blocks and identifying three key features. To extract these features from real images with sufficient structural preservation, we leverage mid-step latent, which is inverted only up to the mid-step. We then adapt attention during injection to improve editability and enhance alignment to the target text. Our method is training-free, requires no user-provided mask, and can be applied even without a source prompt. Extensive experiments on two benchmarks with nine baselines demonstrate its superior performance over prior methods, further validated by human evaluations confirming a strong user preference for our approach.
Problem

Research questions and friction points this paper is trying to address.

Adapt ReFlow for real-image editing challenges
Extract key features from real images
Improve editability and text alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mid-step latent inversion for structural preservation
Attention adaptation during feature injection
Training-free real-image editing without masks
🔎 Similar Papers
No similar papers found.