Improving Noise Efficiency in Privacy-preserving Dataset Distillation

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing privacy-preserving dataset distillation methods suffer from excessive differential privacy (DP) noise and inefficient utilization of private information, primarily due to synchronous sampling-and-optimization and reliance on noisy gradient signals from randomly initialized networks. This work proposes a novel decoupled framework that separates sampling from optimization and introduces, for the first time, a subspace matching mechanism—aligning gradients or feature responses within informative low-dimensional subspaces—to effectively suppress DP noise interference and enhance signal fidelity and convergence stability. The method integrates differential privacy, dataset distillation, subspace matching, and stochastic network signal optimization. On CIFAR-10, it achieves a 10.0% accuracy gain using only 50 distilled images per class; remarkably, with just one-fifth the size of standard distilled sets, it matches or exceeds state-of-the-art performance—demonstrating substantial improvements in DP noise efficiency and private data utility.

Technology Category

Application Category

📝 Abstract

Modern machine learning models heavily rely on large datasets that often include sensitive and private information, raising serious privacy concerns. Differentially private (DP) data generation offers a solution by creating synthetic datasets that limit the leakage of private information within a predefined privacy budget; however, it requires a substantial amount of data to achieve performance comparable to models trained on the original data. To mitigate the significant expense incurred with synthetic data generation, Dataset Distillation (DD) stands out for its remarkable training and storage efficiency. This efficiency is particularly advantageous when integrated with DP mechanisms, curating compact yet informative synthetic datasets without compromising privacy. However, current state-of-the-art private DD methods suffer from a synchronized sampling-optimization process and the dependency on noisy training signals from randomly initialized networks. This results in the inefficient utilization of private information due to the addition of excessive noise. To address these issues, we introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality by mitigating the impact of DP noise through matching in an informative subspace. On CIFAR-10, our method achieves a extbf{10.0%} improvement with 50 images per class and extbf{8.3%} increase with just extbf{one-fifth} the distilled set size of previous state-of-the-art methods, demonstrating significant potential to advance privacy-preserving DD.

Problem

Research questions and friction points this paper is trying to address.

Enhances privacy-preserving dataset distillation efficiency

Reduces excessive noise in synthetic data generation

Improves signal quality in differentially private datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples sampling from optimization for convergence

Improves signal quality via informative subspace matching

Enhances privacy-preserving dataset distillation efficiency

🔎 Similar Papers

Towards Adversarially Robust Dataset Distillation by Curvature Regularization