DiRe: Diversity-promoting Regularization for Dataset Condensation

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In dataset distillation, existing methods produce synthetic data with high redundancy and insufficient diversity. To address this, we propose DiRe—a plug-and-play diversity regularization framework that explicitly enforces sample-level diversity without modifying the backbone architecture. DiRe is the first method to jointly leverage cosine similarity and Euclidean distance to construct an explicit diversity regularizer, seamlessly integrating into any gradient-matching-based distillation pipeline. By synergistically constraining inter-sample distribution discrepancies via these two complementary metrics, DiRe significantly enhances intra-class dispersion and inter-class discriminability of synthetic data. Extensive experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet-1K demonstrate consistent improvements in both classification accuracy and diversity metrics across mainstream distillation methods—including DC, DM, and DSA—surpassing all existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with significant redundancy, so there is a dire need to reduce redundancy and improve the diversity of the synthesized datasets. To tackle this, we propose an intuitive Diversity Regularizer (DiRe) composed of cosine similarity and Euclidean distance, which can be applied off-the-shelf to various state-of-the-art condensation methods. Through extensive experiments, we demonstrate that the addition of our regularizer improves state-of-the-art condensation methods on various benchmark datasets from CIFAR-10 to ImageNet-1K with respect to generalization and diversity metrics.
Problem

Research questions and friction points this paper is trying to address.

Reducing redundancy in condensed datasets
Improving diversity of synthesized training data
Enhancing generalization across benchmark datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes DiRe regularizer using cosine similarity and Euclidean distance
Reduces redundancy and improves diversity in synthesized datasets
Applicable off-the-shelf to various state-of-the-art condensation methods
🔎 Similar Papers
No similar papers found.
S
Saumyaranjan Mohanty
Department of Artificial Intelligence, Indian Institute of Technology Hyderabad
A
Aravind Reddy
Centre for Responsible AI, Wadhwani School of Data Science & AI, Indian Institute of Technology Madras
Konda Reddy Mopuri
Konda Reddy Mopuri
Indian Institute of Technology Hyderabad
Deep LearningData Science and EngineeringComputer Vision