🤖 AI Summary
This work addresses the lack of effective evaluation of how the generative capabilities of diffusion models interact with the invariant and residual components in their self-supervised representations. Introducing a self-supervised learning perspective, the study proposes a feature decomposition–based analytical framework and designs the Invariant Contamination Ratio (ICR), a metric grounded in Fisher information geometry, which detects the onset of memorization using only training features—without external data or test sets. The analysis reveals that representations at intermediate noise levels exhibit both the strongest invariance and optimal classification performance. Furthermore, ICR sensitively captures the early transition from generalization to memorization during training, demonstrating the efficacy of monitoring diffusion model training through the geometry of learned representations.
📝 Abstract
Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspiration from self-supervised learning (SSL), we introduce a framework for jointly evaluating the representation and generation capabilities of diffusion models. Specifically, we decompose features into invariant and residual components and derive the Invariant Contamination Ratio (ICR), a Fisher-based metric that quantifies how residual variation contaminates invariant signal in feature space. We use this framework to analyze both discriminative and generative behavior of diffusion models. On the representation side, we find that invariance peaks at intermediate noise levels, which also yield the best downstream classification performance. On the generative side, we study how training transitions from genuine generalization to memorization in data-limited regimes, and show that ICR serves as a sensitive training-time indicator of early learning: increasing residual energy along Fisher directions marks the onset of memorization, detectable from training features alone without external evaluators or held-out test sets. Overall, our results show that diffusion models can be monitored from a self-supervised perspective through the geometry of their learned representations.