Revelio: Interpreting and leveraging semantic information in diffusion models

📅 2024-11-23

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the limited semantic interpretability of diffusion models by investigating how visual semantic information is represented across network layers and denoising timesteps. Method: We propose the first mechanistic interpretability framework for diffusion models, employing k-sparse autoencoders (k-SAEs) to extract monosemantic, disentangled features; coupling them with lightweight classifiers for transfer learning on frozen diffusion features; and conducting systematic, cross-architecture (e.g., SD1.5, SDXL), cross-dataset, and text-conditioned analyses to quantify representational granularity, inductive bias, and transferability. Results: We validate strong generalization of diffusion features across four benchmark datasets and release open-source code and an interactive visualization toolkit. Our core contribution is the first hierarchical, temporal, and cross-architectural interpretable modeling of semantic structure within diffusion models.

Technology Category

Application Category

📝 Abstract

We study $ extit{how}$ rich visual semantic information is represented within various layers and denoising timesteps of different diffusion architectures. We uncover monosemantic interpretable features by leveraging k-sparse autoencoders (k-SAE). We substantiate our mechanistic interpretations via transfer learning using light-weight classifiers on off-the-shelf diffusion models' features. On $4$ datasets, we demonstrate the effectiveness of diffusion features for representation learning. We provide an in-depth analysis of how different diffusion architectures, pre-training datasets, and language model conditioning impacts visual representation granularity, inductive biases, and transfer learning capabilities. Our work is a critical step towards deepening interpretability of black-box diffusion models. Code and visualizations available at: https://github.com/revelio-diffusion/revelio

Problem

Research questions and friction points this paper is trying to address.

Analyze semantic representation in diffusion model layers

Discover interpretable features using k-sparse autoencoders

Evaluate diffusion features for transfer learning effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging k-sparse autoencoders for interpretable features

Using light-weight classifiers for transfer learning

Analyzing diffusion architectures and pre-training impacts

🔎 Similar Papers

No similar papers found.