Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing dataset distillation methods suffer from insufficient sample diversity, leading to distributional distortion and degraded downstream model accuracy. To address this, we propose a diffusion-based, diversity-driven distillation framework. Our key contributions are: (1) an adaptive memory mechanism that dynamically evaluates and guides the generation of highly representative and diverse distilled samples; and (2) a joint optimization objective that simultaneously enforces distribution alignment and diversity, achieved via a gradient-driven generative distillation loss and a distribution alignment loss. Extensive experiments across multiple benchmarks demonstrate that our method retains over 95% of the original model’s accuracy—even when the distilled dataset size is reduced by 90%—significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Dataset distillation enables the training of deep neural networks with comparable performance in significantly reduced time by compressing large datasets into small and representative ones. Although the introduction of generative models has made great achievements in this field, the distributions of their distilled datasets are not diverse enough to represent the original ones, leading to a decrease in downstream validation accuracy. In this paper, we present a diversity-driven generative dataset distillation method based on a diffusion model to solve this problem. We introduce self-adaptive memory to align the distribution between distilled and real datasets, assessing the representativeness. The degree of alignment leads the diffusion model to generate more diverse datasets during the distillation process. Extensive experiments show that our method outperforms existing state-of-the-art methods in most situations, proving its ability to tackle dataset distillation tasks.

Problem

Research questions and friction points this paper is trying to address.

Improving dataset diversity in generative distillation

Aligning distilled and real dataset distributions

Enhancing downstream validation accuracy via diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversity-driven generative dataset distillation method

Self-adaptive memory aligns dataset distributions

Diffusion model enhances diversity in distillation

🔎 Similar Papers

No similar papers found.

Authors to Follow