Nepotistically Trained Generative-AI Models Collapse

📅 2023-11-20

🏛️ arXiv.org

📈 Citations: 15

✨ Influential: 0

🤖 AI Summary

This work identifies a systematic degradation phenomenon—termed “nepotistic training”—occurring when generative AI models are fine-tuned using images synthesized by themselves. Leveraging diffusion-based text-to-image frameworks guided by CLIP, the study conducts controlled retraining experiments, multi-dimensional quality assessments (e.g., FID), and distributional shift analyses. It empirically demonstrates that the degradation is (i) contagious—injecting merely 0.5% AI-generated images degrades FID by over 200%; (ii) generalizable—distortions persist across unseen prompts; and (iii) irrecoverable—subsequent fine-tuning on clean real data fails to restore performance. The paper formally defines and validates these three core properties of this novel training failure mode. By establishing both theoretical insight and empirical evidence, the findings provide critical implications for sustainable generative model training and responsible content governance.

📝 Abstract

Trained on massive amounts of human-generated content, AI-generated image synthesis is capable of reproducing semantically coherent images that match the visual appearance of its training data. We show that when retrained on even small amounts of their own creation, these generative-AI models produce highly distorted images. We also show that this distortion extends beyond the text prompts used in retraining, and that once affected, the models struggle to fully heal even after retraining on only real images.

Problem

Research questions and friction points this paper is trying to address.

AI models distort images when retrained on their own outputs

Distortion affects unrelated text prompts after retraining

Models fail to fully recover even with real data retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative-AI models trained on human content

Retraining on AI-generated data causes distortion

Distortion persists even after real data retraining

🔎 Similar Papers

No similar papers found.

Authors to Follow