π€ AI Summary
To address the challenge of balancing accuracy and efficiency in neural network pruning under resource-constrained settings, this paper proposes FlexRel, a dynamic pruning method. FlexRel jointly models parameter magnitude during training and task-specific relevance during inferenceβthe first approach to unify these complementary signals. It quantifies parameter relevance via gradient-based sensitivity analysis, performs magnitude-driven pruning with adaptive thresholds, and incorporates a lightweight online importance recalibration mechanism to harmonize information across both phases. This unified framework enables joint optimization of accuracy and resource efficiency. Experiments demonstrate that, under typical accuracy constraints, FlexRel achieves over 35% bandwidth reduction compared to baseline methods, significantly improves pruning ratio, enhances generalization stability, and supports efficient deployment on edge devices.
π Abstract
Pruning neural networks, i.e., removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learning pipeline, especially in resource- and/or bandwidth-constrained scenarios. In this context, the pruning technique, i.e., how to choose the parameters to remove, is critical to the system performance. In this paper, we propose a novel pruning approach, called FlexRel and predicated upon combining training-time and inference-time information, namely, parameter magnitude and relevance, in order to improve the resulting accuracy whilst saving both computational resources and bandwidth. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.