LUSD: Localized Update Score Distillation for Text-Guided Image Editing

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

In text-guided image editing, existing score distillation methods struggle to simultaneously preserve prompt fidelity and background consistency—particularly in object insertion tasks—due to severe spatial and magnitude fluctuations in gradients, resulting in high hyperparameter sensitivity and frequent editing failures. This paper proposes a fine-tuning-free localized score distillation framework featuring two novel mechanisms: (1) attention-driven spatial regularization, which leverages self-attention maps to confine edits to semantically relevant regions, and (2) gradient filtering and ℓ²-normalization, which dynamically suppresses outlier gradients and enforces gradient magnitude stability. These components jointly stabilize the optimization process. Extensive evaluations across multiple benchmarks demonstrate substantial improvements in prompt alignment and editing success rate. A user study reveals a 58–64% higher preference for our method over state-of-the-art approaches, with significantly better background preservation quality.

Technology Category

Application Category

📝 Abstract

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.

Problem

Research questions and friction points this paper is trying to address.

Achieving prompt fidelity and background preservation in image editing.

Addressing gradient variations in object insertion tasks.

Improving success rates of edits while maintaining background integrity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based spatial regularization for gradient updates

Gradient filtering-normalization to reduce variations

Improved prompt fidelity and background preservation

🔎 Similar Papers

TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer