Combining Relevance and Magnitude for Resource-Aware DNN Pruning

📅 2024-05-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the challenge of balancing accuracy and efficiency in neural network pruning under resource-constrained settings, this paper proposes FlexRel, a dynamic pruning method. FlexRel jointly models parameter magnitude during training and task-specific relevance during inference—the first approach to unify these complementary signals. It quantifies parameter relevance via gradient-based sensitivity analysis, performs magnitude-driven pruning with adaptive thresholds, and incorporates a lightweight online importance recalibration mechanism to harmonize information across both phases. This unified framework enables joint optimization of accuracy and resource efficiency. Experiments demonstrate that, under typical accuracy constraints, FlexRel achieves over 35% bandwidth reduction compared to baseline methods, significantly improves pruning ratio, enhances generalization stability, and supports efficient deployment on edge devices.

Technology Category

Application Category

📝 Abstract

Pruning neural networks, i.e., removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learning pipeline, especially in resource- and/or bandwidth-constrained scenarios. In this context, the pruning technique, i.e., how to choose the parameters to remove, is critical to the system performance. In this paper, we propose a novel pruning approach, called FlexRel and predicated upon combining training-time and inference-time information, namely, parameter magnitude and relevance, in order to improve the resulting accuracy whilst saving both computational resources and bandwidth. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.

Problem

Research questions and friction points this paper is trying to address.

Reducing DNN latency via parameter pruning

Optimizing pruning technique for resource efficiency

Balancing accuracy and bandwidth savings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines parameter magnitude and relevance

Improves accuracy while saving resources

Achieves higher pruning factors

🔎 Similar Papers

No similar papers found.