DiRe: Diversity-promoting Regularization for Dataset Condensation

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

In dataset distillation, existing methods produce synthetic data with high redundancy and insufficient diversity. To address this, we propose DiRe—a plug-and-play diversity regularization framework that explicitly enforces sample-level diversity without modifying the backbone architecture. DiRe is the first method to jointly leverage cosine similarity and Euclidean distance to construct an explicit diversity regularizer, seamlessly integrating into any gradient-matching-based distillation pipeline. By synergistically constraining inter-sample distribution discrepancies via these two complementary metrics, DiRe significantly enhances intra-class dispersion and inter-class discriminability of synthetic data. Extensive experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet-1K demonstrate consistent improvements in both classification accuracy and diversity metrics across mainstream distillation methods—including DC, DM, and DSA—surpassing all existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with significant redundancy, so there is a dire need to reduce redundancy and improve the diversity of the synthesized datasets. To tackle this, we propose an intuitive Diversity Regularizer (DiRe) composed of cosine similarity and Euclidean distance, which can be applied off-the-shelf to various state-of-the-art condensation methods. Through extensive experiments, we demonstrate that the addition of our regularizer improves state-of-the-art condensation methods on various benchmark datasets from CIFAR-10 to ImageNet-1K with respect to generalization and diversity metrics.

Problem

Research questions and friction points this paper is trying to address.

Reducing redundancy in condensed datasets

Improving diversity of synthesized training data

Enhancing generalization across benchmark datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes DiRe regularizer using cosine similarity and Euclidean distance

Reduces redundancy and improves diversity in synthesized datasets

Applicable off-the-shelf to various state-of-the-art condensation methods

🔎 Similar Papers

Elucidating the Design Space of Dataset Condensation