๐ค AI Summary
Diffusion model-based image editing suffers from challenges in precisely identifying semantic editing directions and often relies on extensive sampling or additional training. To address this, we propose a training-free, sampling-free analytical method that directly extracts interpretable semantic editing directions from the eigenvectors of self-attention weight matrices in pre-trained diffusion modelsโmarking the first such approach. Our method leverages spectral decomposition to uncover intrinsic parameter structures within the model, eliminating the need for fine-tuning or auxiliary networks, thereby significantly improving editing efficiency and controllability. Extensive experiments on multiple benchmark datasets demonstrate high-fidelity editing results. Notably, our method achieves a 60% speedup in inference time over current state-of-the-art approaches. This work establishes a new paradigm for efficient, interpretable, and parameter-efficient diffusion model editing.
๐ Abstract
Diffusion models achieve remarkable fidelity in image synthesis, yet precise control over their outputs for targeted editing remains challenging. A key step toward controllability is to identify interpretable directions in the model's latent representations that correspond to semantic attributes. Existing approaches for finding interpretable directions typically rely on sampling large sets of images or training auxiliary networks, which limits efficiency. We propose an analytical method that derives semantic editing directions directly from the pretrained parameters of diffusion models, requiring neither additional data nor fine-tuning. Our insight is that self-attention weight matrices encode rich structural information about the data distribution learned during training. By computing the eigenvectors of these weight matrices, we obtain robust and interpretable editing directions. Experiments demonstrate that our method produces high-quality edits across multiple datasets while reducing editing time significantly by 60% over current benchmarks.