Understanding Dataset Distillation via Spectral Filtering

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work reveals that dataset distillation (DD) fundamentally operates as spectral-domain matching—jointly modeling low-frequency (global structure) and high-frequency (local details) components within the eigenspace of feature-feature and feature-label correlation matrices. To formalize this insight, we propose UniDD, a unified framework that interprets mainstream DD methods as selective spectral filtering. Building upon this, we introduce Curriculum-based Frequency Matching (CFM), a dynamic strategy that progressively modulates filter responses to integrate multi-scale spectral information. Our theoretical analysis establishes, for the first time, a rigorous connection between DD and spectral filtering. Extensive experiments demonstrate that UniDD+CFM achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet-1K, significantly outperforming existing approaches while offering strong interpretability grounded in spectral theory.

Technology Category

Application Category

📝 Abstract
Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a spectral filtering framework that unifies diverse DD objectives. UniDD interprets each DD objective as a specific filter function that affects the eigenvalues of the feature-feature correlation (FFC) matrix and modulates the frequency components of the feature-label correlation (FLC) matrix. In this way, UniDD reveals that the essence of DD fundamentally lies in matching frequency-specific features. Moreover, according to the filter behaviors, we classify existing methods into low-frequency matching and high-frequency matching, encoding global texture and local details, respectively. However, existing methods rely on fixed filter functions throughout distillation, which cannot capture the low- and high-frequency information simultaneously. To address this limitation, we further propose Curriculum Frequency Matching (CFM), which gradually adjusts the filter parameter to cover both low- and high-frequency information of the FFC and FLC matrices. Extensive experiments on small-scale datasets, such as CIFAR-10/100, and large-scale datasets, including ImageNet-1K, demonstrate the superior performance of CFM over existing baselines and validate the practicality of UniDD.
Problem

Research questions and friction points this paper is trying to address.

Unifies diverse dataset distillation objectives via spectral filtering.
Classifies methods into low- and high-frequency feature matching.
Proposes Curriculum Frequency Matching to capture both frequency information.
Innovation

Methods, ideas, or system contributions that make the work stand out.

UniDD unifies dataset distillation via spectral filtering.
Curriculum Frequency Matching adjusts filter parameters dynamically.
CFM captures both low- and high-frequency information effectively.
🔎 Similar Papers
No similar papers found.