Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Unstructured pruning typically requires multiple iterative cycles of training, pruning, and fine-tuning, incurring substantial computational overhead. To address this, we propose a teacher-guided one-shot global pruning framework. Our method incorporates first-order gradient signals from a teacher model during importance scoring and integrates context-aware knowledge distillation to precisely identify and retain critical parameters. Notably, we are the first to embed knowledge distillation directly into the pruning evaluation phase—rather than solely for post-pruning recovery—thereby enhancing performance retention under high sparsity. The framework jointly leverages unstructured pruning, gradient-guided importance scoring, context-aware distillation, and sparsity-aware retraining. Experiments on CIFAR-10/100 and TinyImageNet demonstrate that our approach significantly outperforms state-of-the-art baselines—including EPG and EPSD—at high sparsity levels, while also achieving superior efficiency compared to iterative methods such as COLT.

Technology Category

Application Category

📝 Abstract

Unstructured pruning remains a powerful strategy for compressing deep neural networks, yet it often demands iterative train-prune-retrain cycles, resulting in significant computational overhead. To address this challenge, we introduce a novel teacher-guided pruning framework that tightly integrates Knowledge Distillation (KD) with importance score estimation. Unlike prior approaches that apply KD as a post-pruning recovery step, our method leverages gradient signals informed by the teacher during importance score calculation to identify and retain parameters most critical for both task performance and knowledge transfer. Our method facilitates a one-shot global pruning strategy that efficiently eliminates redundant weights while preserving essential representations. After pruning, we employ sparsity-aware retraining with and without KD to recover accuracy without reactivating pruned connections. Comprehensive experiments across multiple image classification benchmarks, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our method consistently achieves high sparsity levels with minimal performance degradation. Notably, our approach outperforms state-of-the-art baselines such as EPG and EPSD at high sparsity levels, while offering a more computationally efficient alternative to iterative pruning schemes like COLT. The proposed framework offers a computation-efficient, performance-preserving solution well suited for deployment in resource-constrained environments.

Problem

Research questions and friction points this paper is trying to address.

Eliminates iterative train-prune-retrain cycles in neural network pruning

Identifies critical parameters for task performance and knowledge transfer

Achieves high sparsity with minimal accuracy loss for efficient deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Teacher-guided pruning with knowledge distillation integration

One-shot global pruning using gradient-informed importance scores

Sparsity-aware retraining without reactivating pruned connections

🔎 Similar Papers

No similar papers found.