Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unstructured pruning typically requires multiple iterative cycles of training, pruning, and fine-tuning, incurring substantial computational overhead. To address this, we propose a teacher-guided one-shot global pruning framework. Our method incorporates first-order gradient signals from a teacher model during importance scoring and integrates context-aware knowledge distillation to precisely identify and retain critical parameters. Notably, we are the first to embed knowledge distillation directly into the pruning evaluation phase—rather than solely for post-pruning recovery—thereby enhancing performance retention under high sparsity. The framework jointly leverages unstructured pruning, gradient-guided importance scoring, context-aware distillation, and sparsity-aware retraining. Experiments on CIFAR-10/100 and TinyImageNet demonstrate that our approach significantly outperforms state-of-the-art baselines—including EPG and EPSD—at high sparsity levels, while also achieving superior efficiency compared to iterative methods such as COLT.

Technology Category

Application Category

📝 Abstract
Unstructured pruning remains a powerful strategy for compressing deep neural networks, yet it often demands iterative train-prune-retrain cycles, resulting in significant computational overhead. To address this challenge, we introduce a novel teacher-guided pruning framework that tightly integrates Knowledge Distillation (KD) with importance score estimation. Unlike prior approaches that apply KD as a post-pruning recovery step, our method leverages gradient signals informed by the teacher during importance score calculation to identify and retain parameters most critical for both task performance and knowledge transfer. Our method facilitates a one-shot global pruning strategy that efficiently eliminates redundant weights while preserving essential representations. After pruning, we employ sparsity-aware retraining with and without KD to recover accuracy without reactivating pruned connections. Comprehensive experiments across multiple image classification benchmarks, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our method consistently achieves high sparsity levels with minimal performance degradation. Notably, our approach outperforms state-of-the-art baselines such as EPG and EPSD at high sparsity levels, while offering a more computationally efficient alternative to iterative pruning schemes like COLT. The proposed framework offers a computation-efficient, performance-preserving solution well suited for deployment in resource-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Eliminates iterative train-prune-retrain cycles in neural network pruning
Identifies critical parameters for task performance and knowledge transfer
Achieves high sparsity with minimal accuracy loss for efficient deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Teacher-guided pruning with knowledge distillation integration
One-shot global pruning using gradient-informed importance scores
Sparsity-aware retraining without reactivating pruned connections
🔎 Similar Papers
No similar papers found.
M
Md. Samiul Alim
Apurba-NSU R&D Lab, Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
S
Sharjil Khan
Apurba-NSU R&D Lab, Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
A
Amrijit Biswas
Apurba-NSU R&D Lab, Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
F
Fuad Rahman
Apurba Technologies, Sunnyvale, CA 94085, USA
Shafin Rahman
Shafin Rahman
Associate Professor, ECE, North South University, Bangladesh
Computer VisionMachine Learning
Nabeel Mohammed
Nabeel Mohammed
North South University
Natural Language ProcessingComputer VisionDeep Learning