🤖 AI Summary
Diffusion models suffer from opaque generation processes, leading to poor interpretability and susceptibility to spurious correlations and shortcut learning. To address this, we propose Patronus—a novel framework that, for the first time, seamlessly integrates prototype learning into the DDPM architecture under fully self-supervised conditions (i.e., without labels or text prompts), enabling interpretable generation. Patronus unifies ProtoPNet-inspired prototype representation, prototype activation vector–based conditional control, implicit prototype learning, and reverse-diffusion path modulation. It supports prototype-driven synthesis, fine-grained image editing, and explicit detection of shortcut learning. Experiments demonstrate that Patronus significantly enhances semantic transparency of the generation process while precisely localizing implicit model biases and unintended semantic associations. By grounding diffusion generation in human-interpretable prototypes, Patronus establishes a new paradigm for trustworthy, controllable, and explainable diffusion modeling.
📝 Abstract
Diffusion-based generative models, such as Denoising Diffusion Probabilistic Models (DDPMs), have achieved remarkable success in image generation, but their step-by-step denoising process remains opaque, leaving critical aspects of the generation mechanism unexplained. To address this, we introduce emph{Patronus}, an interpretable diffusion model inspired by ProtoPNet. Patronus integrates a prototypical network into DDPMs, enabling the extraction of prototypes and conditioning of the generation process on their prototype activation vector. This design enhances interpretability by showing the learned prototypes and how they influence the generation process. Additionally, the model supports downstream tasks like image manipulation, enabling more transparent and controlled modifications. Moreover, Patronus could reveal shortcut learning in the generation process by detecting unwanted correlations between learned prototypes. Notably, Patronus operates entirely without any annotations or text prompts. This work opens new avenues for understanding and controlling diffusion models through prototype-based interpretability. Our code is available at href{https://github.com/nina-weng/patronus}{https://github.com/nina-weng/patronus}.