Self-Attention Decomposition For Training Free Diffusion Editing

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Diffusion model-based image editing suffers from challenges in precisely identifying semantic editing directions and often relies on extensive sampling or additional training. To address this, we propose a training-free, sampling-free analytical method that directly extracts interpretable semantic editing directions from the eigenvectors of self-attention weight matrices in pre-trained diffusion models—marking the first such approach. Our method leverages spectral decomposition to uncover intrinsic parameter structures within the model, eliminating the need for fine-tuning or auxiliary networks, thereby significantly improving editing efficiency and controllability. Extensive experiments on multiple benchmark datasets demonstrate high-fidelity editing results. Notably, our method achieves a 60% speedup in inference time over current state-of-the-art approaches. This work establishes a new paradigm for efficient, interpretable, and parameter-efficient diffusion model editing.

Technology Category

Application Category

📝 Abstract

Diffusion models achieve remarkable fidelity in image synthesis, yet precise control over their outputs for targeted editing remains challenging. A key step toward controllability is to identify interpretable directions in the model's latent representations that correspond to semantic attributes. Existing approaches for finding interpretable directions typically rely on sampling large sets of images or training auxiliary networks, which limits efficiency. We propose an analytical method that derives semantic editing directions directly from the pretrained parameters of diffusion models, requiring neither additional data nor fine-tuning. Our insight is that self-attention weight matrices encode rich structural information about the data distribution learned during training. By computing the eigenvectors of these weight matrices, we obtain robust and interpretable editing directions. Experiments demonstrate that our method produces high-quality edits across multiple datasets while reducing editing time significantly by 60% over current benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Identifying interpretable semantic directions in diffusion models

Reducing reliance on sampling and auxiliary networks for editing

Deriving editing directions directly from pretrained model parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Derives editing directions from pretrained diffusion parameters

Computes eigenvectors of self-attention weight matrices

Requires no additional data sampling or model fine-tuning

🔎 Similar Papers

TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer