Neon: Negative Extrapolation From Self-Training Improves Image Generation

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Generative AI is hindered by the scarcity of high-quality training data; leveraging unverified synthetic data risks inducing Model Autophagy Disorder (MAD), causing severe degradation in sample fidelity and diversity. To address this, we propose Neon—a novel self-training framework that, for the first time, repurposes degenerative gradients exposed during self-training as constructive optimization signals via negative-gradient extrapolation, enabling model self-improvement without additional real data. Neon incorporates inference-sampler-aware weight correction, requiring only a minimal number of synthetic samples and incurring merely 0.36% extra computational overhead. On ImageNet 256×256, Neon reduces the FID of xAR-L to 1.02—establishing a new state-of-the-art. Extensive experiments across diverse architectures and datasets validate Neon’s generalizability and theoretical interpretability.

Technology Category

Application Category

📝 Abstract

Scaling generative AI models is bottlenecked by the scarcity of high-quality training data. The ease of synthesizing from a generative model suggests using (unverified) synthetic data to augment a limited corpus of real data for the purpose of fine-tuning in the hope of improving performance. Unfortunately, however, the resulting positive feedback loop leads to model autophagy disorder (MAD, aka model collapse) that results in a rapid degradation in sample quality and/or diversity. In this paper, we introduce Neon (for Negative Extrapolation frOm self-traiNing), a new learning method that turns the degradation from self-training into a powerful signal for self-improvement. Given a base model, Neon first fine-tunes it on its own self-synthesized data but then, counterintuitively, reverses its gradient updates to extrapolate away from the degraded weights. We prove that Neon works because typical inference samplers that favor high-probability regions create a predictable anti-alignment between the synthetic and real data population gradients, which negative extrapolation corrects to better align the model with the true data distribution. Neon is remarkably easy to implement via a simple post-hoc merge that requires no new real data, works effectively with as few as 1k synthetic samples, and typically uses less than 1% additional training compute. We demonstrate Neon's universality across a range of architectures (diffusion, flow matching, autoregressive, and inductive moment matching models) and datasets (ImageNet, CIFAR-10, and FFHQ). In particular, on ImageNet 256x256, Neon elevates the xAR-L model to a new state-of-the-art FID of 1.02 with only 0.36% additional training compute. Code is available at https://github.com/SinaAlemohammad/Neon

Problem

Research questions and friction points this paper is trying to address.

Addresses model collapse from synthetic data self-training in generative AI

Introduces negative gradient extrapolation to correct distribution misalignment

Enables performance improvement without additional real data or significant compute

Innovation

Methods, ideas, or system contributions that make the work stand out.

Negative extrapolation corrects self-training degradation

Reverses gradient updates to improve model alignment

Simple post-hoc merge requiring minimal additional compute

🔎 Similar Papers

No similar papers found.