Finding Stable Subnetworks at Initialization with Dataset Distillation

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Dense initialization instability undermines the Lottery Ticket Hypothesis (LTH), hindering reliable identification of trainable sparse subnetworks (“winning tickets”). Method: We integrate dataset distillation with iterative magnitude pruning (IMP) to directly search for stable, trainable subnetworks at initialization—bypassing unstable dense training. Contribution/Results: First, we empirically establish that stable sparse substructures persist even within unstable dense initializations. Second, we demonstrate that distilled data guide pruning toward smoother regions of the loss landscape, enabling robust winning ticket discovery at higher sparsity levels. We validate generalization robustness via linear mode connectivity and SGD noise stability analysis. Experiments show that, using only 1/150 of the original CIFAR-10 dataset, our method recovers the retraining performance of standard winning tickets on ResNet-18 while achieving stable convergence at significantly higher sparsity.

Technology Category

Application Category

📝 Abstract

Recent works have shown that Dataset Distillation, the process for summarizing the training data, can be leveraged to accelerate the training of deep learning models. However, its impact on training dynamics, particularly in neural network pruning, remains largely unexplored. In our work, we use distilled data in the inner loop of iterative magnitude pruning to produce sparse, trainable subnetworks at initialization -- more commonly known as lottery tickets. While using 150x less training points, our algorithm matches the performance of traditional lottery ticket rewinding on ResNet-18&CIFAR-10. Previous work highlights that lottery tickets can be found when the dense initialization is stable to SGD noise (i.e. training across different ordering of the data converges to the same minima). We extend this discovery, demonstrating that stable subnetworks can exist even within an unstable dense initialization. In our linear mode connectivity studies, we find that pruning with distilled data discards parameters that contribute to the sharpness of the loss landscape. Lastly, we show that by first generating a stable sparsity mask at initialization, we can find lottery tickets at significantly higher sparsities than traditional iterative magnitude pruning.

Problem

Research questions and friction points this paper is trying to address.

Exploring dataset distillation's impact on neural network pruning dynamics

Finding stable subnetworks within unstable dense initializations

Enabling higher sparsity lottery tickets via stable initialization masks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses distilled data in iterative magnitude pruning

Finds stable subnetworks within unstable initializations

Generates stable sparsity masks at initialization

🔎 Similar Papers

No similar papers found.