LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In text-driven image editing based on rectified flows, imprecise semantic control arises because the inverted noise inherits semantic biases from the source image, suppressing attention to target concepts and leading to editing failures or background contamination. To address this, we propose a training-free, architecture-agnostic latent-space optimization method: leveraging pretrained rectified flow models, we directly optimize the inverted noise under natural language guidance via gradient-based refinement, enabling precise concept replacement and localized editing. Our approach is the first to identify and exploit the semantic structural deficiencies inherent in inverted noise, significantly enhancing edit controllability and generalizability. Extensive evaluation on three benchmarks—PIEBench, SmartEdit, and GapEdit—demonstrates consistent superiority over state-of-the-art baselines across semantic alignment, image fidelity, and background preservation.

Technology Category

Application Category

📝 Abstract
Text-driven image editing enables users to flexibly modify visual content through natural language instructions, and is widely applied to tasks such as semantic object replacement, insertion, and removal. While recent inversion-based editing methods using rectified flow models have achieved promising results in image quality, we identify a structural limitation in their editing behavior: the semantic bias toward the source concept encoded in the inverted noise tends to suppress attention to the target concept. This issue becomes particularly critical when the source and target semantics are dissimilar, where the attention mechanism inherently leads to editing failure or unintended modifications in non-target regions. In this paper, we systematically analyze and validate this structural flaw, and introduce LORE, a training-free and efficient image editing method. LORE directly optimizes the inverted noise, addressing the core limitations in generalization and controllability of existing approaches, enabling stable, controllable, and general-purpose concept replacement, without requiring architectural modification or model fine-tuning. We conduct comprehensive evaluations on three challenging benchmarks: PIEBench, SmartEdit, and GapEdit. Experimental results show that LORE significantly outperforms strong baselines in terms of semantic alignment, image quality, and background fidelity, demonstrating the effectiveness and scalability of latent-space optimization for general-purpose image editing.
Problem

Research questions and friction points this paper is trying to address.

Addresses semantic bias in rectified flow-based image editing
Solves editing failures for dissimilar source-target concepts
Improves generalization and controllability without model retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes inverted noise for better control
Training-free method enhances semantic alignment
Improves generalization without model fine-tuning
🔎 Similar Papers
No similar papers found.