Finding Stable Subnetworks at Initialization with Dataset Distillation

📅 2025-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dense initialization instability undermines the Lottery Ticket Hypothesis (LTH), hindering reliable identification of trainable sparse subnetworks (“winning tickets”). Method: We integrate dataset distillation with iterative magnitude pruning (IMP) to directly search for stable, trainable subnetworks at initialization—bypassing unstable dense training. Contribution/Results: First, we empirically establish that stable sparse substructures persist even within unstable dense initializations. Second, we demonstrate that distilled data guide pruning toward smoother regions of the loss landscape, enabling robust winning ticket discovery at higher sparsity levels. We validate generalization robustness via linear mode connectivity and SGD noise stability analysis. Experiments show that, using only 1/150 of the original CIFAR-10 dataset, our method recovers the retraining performance of standard winning tickets on ResNet-18 while achieving stable convergence at significantly higher sparsity.

Technology Category

Application Category

📝 Abstract
Recent works have shown that Dataset Distillation, the process for summarizing the training data, can be leveraged to accelerate the training of deep learning models. However, its impact on training dynamics, particularly in neural network pruning, remains largely unexplored. In our work, we use distilled data in the inner loop of iterative magnitude pruning to produce sparse, trainable subnetworks at initialization -- more commonly known as lottery tickets. While using 150x less training points, our algorithm matches the performance of traditional lottery ticket rewinding on ResNet-18&CIFAR-10. Previous work highlights that lottery tickets can be found when the dense initialization is stable to SGD noise (i.e. training across different ordering of the data converges to the same minima). We extend this discovery, demonstrating that stable subnetworks can exist even within an unstable dense initialization. In our linear mode connectivity studies, we find that pruning with distilled data discards parameters that contribute to the sharpness of the loss landscape. Lastly, we show that by first generating a stable sparsity mask at initialization, we can find lottery tickets at significantly higher sparsities than traditional iterative magnitude pruning.
Problem

Research questions and friction points this paper is trying to address.

Exploring dataset distillation's impact on neural network pruning dynamics
Finding stable subnetworks within unstable dense initializations
Enabling higher sparsity lottery tickets via stable initialization masks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses distilled data in iterative magnitude pruning
Finds stable subnetworks within unstable initializations
Generates stable sparsity masks at initialization
🔎 Similar Papers
No similar papers found.