๐ค AI Summary
Generative models suffer from a โself-consumption loopโ when synthetic data is reused across multiple generations, leading to training instability and model collapse. Method: We propose a latent-space filtering approach that requires no additional data or human annotations. Our method identifies progressive degradation of low-dimensional structure in the latent space across generations, establishes a theoretical analysis framework grounded in this observation, and leverages diffusion models to dynamically characterize the degradation process. We further design a quantitative metric for low-dimensional structural quality and an adaptive filtering strategy to precisely identify and remove low-fidelity synthetic samples from mixed datasets. Contribution/Results: Extensive experiments on multiple real-world benchmarks demonstrate that our method significantly outperforms existing baselines. It effectively mitigates model collapse without any extra training overhead and consistently improves multi-generation training stability and performance.
๐ Abstract
As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop" that can lead to training instability or extit{model collapse}. Common strategies to address the issue -- such as accumulating historical training data or injecting fresh real data -- either increase computational cost or require expensive human annotation. In this paper, we empirically analyze the latent space dynamics of self-consuming diffusion models and observe that the low-dimensional structure of latent representations extracted from synthetic data degrade over generations. Based on this insight, we propose extit{Latent Space Filtering} (LSF), a novel approach that mitigates model collapse by filtering out less realistic synthetic data from mixed datasets. Theoretically, we present a framework that connects latent space degradation to empirical observations. Experimentally, we show that LSF consistently outperforms existing baselines across multiple real-world datasets, effectively mitigating model collapse without increasing training cost or relying on human annotation.