🤖 AI Summary
Unstructured pruning typically requires multiple iterative cycles of training, pruning, and fine-tuning, incurring substantial computational overhead. To address this, we propose a teacher-guided one-shot global pruning framework. Our method incorporates first-order gradient signals from a teacher model during importance scoring and integrates context-aware knowledge distillation to precisely identify and retain critical parameters. Notably, we are the first to embed knowledge distillation directly into the pruning evaluation phase—rather than solely for post-pruning recovery—thereby enhancing performance retention under high sparsity. The framework jointly leverages unstructured pruning, gradient-guided importance scoring, context-aware distillation, and sparsity-aware retraining. Experiments on CIFAR-10/100 and TinyImageNet demonstrate that our approach significantly outperforms state-of-the-art baselines—including EPG and EPSD—at high sparsity levels, while also achieving superior efficiency compared to iterative methods such as COLT.
📝 Abstract
Unstructured pruning remains a powerful strategy for compressing deep neural networks, yet it often demands iterative train-prune-retrain cycles, resulting in significant computational overhead. To address this challenge, we introduce a novel teacher-guided pruning framework that tightly integrates Knowledge Distillation (KD) with importance score estimation. Unlike prior approaches that apply KD as a post-pruning recovery step, our method leverages gradient signals informed by the teacher during importance score calculation to identify and retain parameters most critical for both task performance and knowledge transfer. Our method facilitates a one-shot global pruning strategy that efficiently eliminates redundant weights while preserving essential representations. After pruning, we employ sparsity-aware retraining with and without KD to recover accuracy without reactivating pruned connections. Comprehensive experiments across multiple image classification benchmarks, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our method consistently achieves high sparsity levels with minimal performance degradation. Notably, our approach outperforms state-of-the-art baselines such as EPG and EPSD at high sparsity levels, while offering a more computationally efficient alternative to iterative pruning schemes like COLT. The proposed framework offers a computation-efficient, performance-preserving solution well suited for deployment in resource-constrained environments.