Combining Relevance and Magnitude for Resource-Aware DNN Pruning

πŸ“… 2024-05-21
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenge of balancing accuracy and efficiency in neural network pruning under resource-constrained settings, this paper proposes FlexRel, a dynamic pruning method. FlexRel jointly models parameter magnitude during training and task-specific relevance during inferenceβ€”the first approach to unify these complementary signals. It quantifies parameter relevance via gradient-based sensitivity analysis, performs magnitude-driven pruning with adaptive thresholds, and incorporates a lightweight online importance recalibration mechanism to harmonize information across both phases. This unified framework enables joint optimization of accuracy and resource efficiency. Experiments demonstrate that, under typical accuracy constraints, FlexRel achieves over 35% bandwidth reduction compared to baseline methods, significantly improves pruning ratio, enhances generalization stability, and supports efficient deployment on edge devices.

Technology Category

Application Category

πŸ“ Abstract
Pruning neural networks, i.e., removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learning pipeline, especially in resource- and/or bandwidth-constrained scenarios. In this context, the pruning technique, i.e., how to choose the parameters to remove, is critical to the system performance. In this paper, we propose a novel pruning approach, called FlexRel and predicated upon combining training-time and inference-time information, namely, parameter magnitude and relevance, in order to improve the resulting accuracy whilst saving both computational resources and bandwidth. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.
Problem

Research questions and friction points this paper is trying to address.

Reducing DNN latency via parameter pruning
Optimizing pruning technique for resource efficiency
Balancing accuracy and bandwidth savings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines parameter magnitude and relevance
Improves accuracy while saving resources
Achieves higher pruning factors
πŸ”Ž Similar Papers
No similar papers found.
C
C. Chiasserini
Politecnico di Torino, Italy; CNR-IEIIT, Italy; CNIT, Italy; Chalmers University of Technology, Sweden
F
F. Malandrino
CNR-IEIIT, Italy; CNIT, Italy
N
Nuria Molner
iTEAM Research Institute - Universitat Polit`ecnica de Val`encia, Spain
Zhiqiang Zhao
Zhiqiang Zhao
Politecnico di Torino, Italy