Revelio: Interpreting and leveraging semantic information in diffusion models

📅 2024-11-23
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited semantic interpretability of diffusion models by investigating how visual semantic information is represented across network layers and denoising timesteps. Method: We propose the first mechanistic interpretability framework for diffusion models, employing k-sparse autoencoders (k-SAEs) to extract monosemantic, disentangled features; coupling them with lightweight classifiers for transfer learning on frozen diffusion features; and conducting systematic, cross-architecture (e.g., SD1.5, SDXL), cross-dataset, and text-conditioned analyses to quantify representational granularity, inductive bias, and transferability. Results: We validate strong generalization of diffusion features across four benchmark datasets and release open-source code and an interactive visualization toolkit. Our core contribution is the first hierarchical, temporal, and cross-architectural interpretable modeling of semantic structure within diffusion models.

Technology Category

Application Category

📝 Abstract
We study $ extit{how}$ rich visual semantic information is represented within various layers and denoising timesteps of different diffusion architectures. We uncover monosemantic interpretable features by leveraging k-sparse autoencoders (k-SAE). We substantiate our mechanistic interpretations via transfer learning using light-weight classifiers on off-the-shelf diffusion models' features. On $4$ datasets, we demonstrate the effectiveness of diffusion features for representation learning. We provide an in-depth analysis of how different diffusion architectures, pre-training datasets, and language model conditioning impacts visual representation granularity, inductive biases, and transfer learning capabilities. Our work is a critical step towards deepening interpretability of black-box diffusion models. Code and visualizations available at: https://github.com/revelio-diffusion/revelio
Problem

Research questions and friction points this paper is trying to address.

Analyze semantic representation in diffusion model layers
Discover interpretable features using k-sparse autoencoders
Evaluate diffusion features for transfer learning effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging k-sparse autoencoders for interpretable features
Using light-weight classifiers for transfer learning
Analyzing diffusion architectures and pre-training impacts
🔎 Similar Papers
No similar papers found.