Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying deep neural networks on edge devices requires aggressive compression while preserving accuracy, yet layers exhibit markedly different sensitivities to pruning, making a single pruning strategy suboptimal. This work proposes a globally guided, layer-wise sparsification framework that systematically generates diverse, high-quality pruning configurations by integrating multiple sensitivity signals—such as weight magnitude and gradient information—with architecture-aware rules, including normalization layer preservation and aggressive pruning of the classification head, without introducing new pruning criteria. The method achieves Pareto-optimal performance across both CNNs and Vision Transformers, reducing accuracy loss by 40% on the Swin-Tiny model compared to single-criterion pruning approaches.

Technology Category

Application Category

📝 Abstract
Deploying deep neural networks (DNNs) on edge devices requires strong compression with minimal accuracy loss. This paper introduces Mix-and-Match Pruning, a globally guided, layer-wise sparsification framework that leverages sensitivity scores and simple architectural rules to generate diverse, high-quality pruning configurations. The framework addresses a key limitation that different layers and architectures respond differently to pruning, making single-strategy approaches suboptimal. Mix-and-Match derives architecture-aware sparsity ranges, e.g., preserving normalization layers while pruning classifiers more aggressively, and systematically samples these ranges to produce ten strategies per sensitivity signal (magnitude, gradient, or their combination). This eliminates repeated pruning runs while offering deployment-ready accuracy-sparsity trade-offs. Experiments on CNNs and Vision Transformers demonstrate Pareto-optimal results, with Mix-and-Match reducing accuracy degradation on Swin-Tiny by 40% relative to standard single-criterion pruning. These findings show that coordinating existing pruning signals enables more reliable and efficient compressed models than introducing new criteria.
Problem

Research questions and friction points this paper is trying to address.

model compression
pruning
sparsification
edge deployment
accuracy-sparsity trade-off
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mix-and-Match Pruning
layer-wise sparsification
sensitivity-aware pruning
architecture-aware sparsity
Pareto-optimal compression
🔎 Similar Papers
D
Danial Monachan
Brandenburg Technical University, Cottbus, Germany
S
Samira Nazari
University of Zanjan, Iran
Mahdi Taheri
Mahdi Taheri
Postdoc - BTU Cottbus-Senftenberg
ReliabilityFault TolerantNeural NetworksHardware AccelerationApproximate Computing
A
Ali Azarpeyvand
University of Zanjan, Iran
Milos Krstic
Milos Krstic
Professor, University of Potsdam; Department Head, IHP, Frankfurt (Oder) Germany
GALSasynchronous circuit designfault toleranceradhard designreliability
M
Michael Huebner
Brandenburg Technical University, Cottbus, Germany
C
Christian Herglotz
Brandenburg Technical University, Cottbus, Germany