Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Deploying deep neural networks on edge devices requires aggressive compression while preserving accuracy, yet layers exhibit markedly different sensitivities to pruning, making a single pruning strategy suboptimal. This work proposes a globally guided, layer-wise sparsification framework that systematically generates diverse, high-quality pruning configurations by integrating multiple sensitivity signals—such as weight magnitude and gradient information—with architecture-aware rules, including normalization layer preservation and aggressive pruning of the classification head, without introducing new pruning criteria. The method achieves Pareto-optimal performance across both CNNs and Vision Transformers, reducing accuracy loss by 40% on the Swin-Tiny model compared to single-criterion pruning approaches.

Technology Category

Application Category

📝 Abstract

Deploying deep neural networks (DNNs) on edge devices requires strong compression with minimal accuracy loss. This paper introduces Mix-and-Match Pruning, a globally guided, layer-wise sparsification framework that leverages sensitivity scores and simple architectural rules to generate diverse, high-quality pruning configurations. The framework addresses a key limitation that different layers and architectures respond differently to pruning, making single-strategy approaches suboptimal. Mix-and-Match derives architecture-aware sparsity ranges, e.g., preserving normalization layers while pruning classifiers more aggressively, and systematically samples these ranges to produce ten strategies per sensitivity signal (magnitude, gradient, or their combination). This eliminates repeated pruning runs while offering deployment-ready accuracy-sparsity trade-offs. Experiments on CNNs and Vision Transformers demonstrate Pareto-optimal results, with Mix-and-Match reducing accuracy degradation on Swin-Tiny by 40% relative to standard single-criterion pruning. These findings show that coordinating existing pruning signals enables more reliable and efficient compressed models than introducing new criteria.

Problem

Research questions and friction points this paper is trying to address.

model compression

pruning

sparsification

edge deployment

accuracy-sparsity trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mix-and-Match Pruning

layer-wise sparsification

sensitivity-aware pruning