Flexible Automatic Identification and Removal (FAIR)-Pruner: An Efficient Neural Network Pruning Method

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenge of deploying large language models on resource-constrained edge devices, this paper proposes FAIR-Pruner—a one-shot, structured pruning method that requires no post-training fine-tuning. Methodologically, it jointly leverages a Wasserstein-distance-driven unit utilization score and Taylor-expansion-based reconstruction error to simultaneously quantify parameter importance and its impact on accuracy. Additionally, it introduces a differential tolerance mechanism that automatically determines layer-wise pruning ratios, enabling flexible, configuration-free sparsity allocation. Evaluated on ImageNet with models including VGG, FAIR-Pruner achieves significant reductions in model size and computational cost (FLOPs), while incurring negligible accuracy degradation. Crucially, it eliminates the need for any fine-tuning—enhancing deployment efficiency, automation, and practicality for edge scenarios.

Technology Category

Application Category

📝 Abstract

Neural network pruning is a critical compression technique that facilitates the deployment of large-scale neural networks on resource-constrained edge devices, typically by identifying and eliminating redundant or insignificant parameters to reduce computational and memory overhead. This paper proposes the Flexible Automatic Identification and Removal (FAIR)-Pruner, a novel method for neural network structured pruning. Specifically, FAIR-Pruner first evaluates the importance of each unit (e.g., neuron or channel) through the Utilization Score quantified by the Wasserstein distance. To reflect the performance degradation after unit removal, it then introduces the Reconstruction Error, which is computed via the Taylor expansion of the loss function. Finally, FAIR-Pruner identifies superfluous units with negligible impact on model performance by controlling the proposed Tolerance of Difference, which measures differences between unimportant units and those that cause performance degradation. A major advantage of FAIR-Pruner lies in its capacity to automatically determine the layer-wise pruning rates, which yields a more efficient subnetwork structure compared to applying a uniform pruning rate. Another advantage of the FAIR-Pruner is its great one-shot performance without post-pruning fine-tuning. Furthermore, with utilization scores and reconstruction errors, users can flexibly obtain pruned models under different pruning ratios. Comprehensive experimental validation on diverse benchmark datasets (e.g., ImageNet) and various neural network architectures (e.g., VGG) demonstrates that FAIR-Pruner achieves significant model compression while maintaining high accuracy.

Problem

Research questions and friction points this paper is trying to address.

Efficiently prunes neural networks for edge devices

Automatically determines layer-wise pruning rates

Maintains high accuracy without fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Wasserstein distance for unit importance evaluation

Employs Taylor expansion to compute reconstruction error

Automatically determines layer-wise pruning rates

🔎 Similar Papers

No similar papers found.

Authors to Follow