🤖 AI Summary
This work reveals that dataset distillation (DD) fundamentally operates as spectral-domain matching—jointly modeling low-frequency (global structure) and high-frequency (local details) components within the eigenspace of feature-feature and feature-label correlation matrices. To formalize this insight, we propose UniDD, a unified framework that interprets mainstream DD methods as selective spectral filtering. Building upon this, we introduce Curriculum-based Frequency Matching (CFM), a dynamic strategy that progressively modulates filter responses to integrate multi-scale spectral information. Our theoretical analysis establishes, for the first time, a rigorous connection between DD and spectral filtering. Extensive experiments demonstrate that UniDD+CFM achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet-1K, significantly outperforming existing approaches while offering strong interpretability grounded in spectral theory.
📝 Abstract
Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a spectral filtering framework that unifies diverse DD objectives. UniDD interprets each DD objective as a specific filter function that affects the eigenvalues of the feature-feature correlation (FFC) matrix and modulates the frequency components of the feature-label correlation (FLC) matrix. In this way, UniDD reveals that the essence of DD fundamentally lies in matching frequency-specific features. Moreover, according to the filter behaviors, we classify existing methods into low-frequency matching and high-frequency matching, encoding global texture and local details, respectively. However, existing methods rely on fixed filter functions throughout distillation, which cannot capture the low- and high-frequency information simultaneously. To address this limitation, we further propose Curriculum Frequency Matching (CFM), which gradually adjusts the filter parameter to cover both low- and high-frequency information of the FFC and FLC matrices. Extensive experiments on small-scale datasets, such as CIFAR-10/100, and large-scale datasets, including ImageNet-1K, demonstrate the superior performance of CFM over existing baselines and validate the practicality of UniDD.