ProxSparse: Regularized Learning of Semi-Structured Sparsity Masks for Pretrained LLMs

📅 2025-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing semi-structured pruning methods for large language model (LLM) deployment suffer from suboptimal accuracy-efficiency trade-offs due to layer-wise heuristic rules and lack of global optimization. To address this, we propose a differentiable sparse mask learning framework based on regularized optimization. Our approach formulates mask selection as an end-to-end, weight-free differentiable regularization problem—eliminating hard intra-layer constraints and enabling globally aware, progressive 2:4 semi-structured pruning. By jointly optimizing differentiable sparsity modeling and gradient-driven mask learning, the method introduces no additional parameters or training overhead. Evaluated on seven mainstream LLMs, it achieves significantly higher inference speedup (average +1.8×) and substantially reduced accuracy degradation (35–62% lower loss) compared to state-of-the-art baselines, demonstrating superior efficiency and robustness.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semi-structured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback. We present ProxSparse, a learning-based framework for mask selection enabled by regularized optimization. ProxSparse transforms the rigid, non-differentiable mask selection process into a smoother optimization procedure, allowing gradual mask exploration with flexibility. ProxSparse does not involve additional weight updates once the mask is determined. Our extensive evaluations on 7 widely used models show that ProxSparse consistently outperforms previously proposed semi-structured mask selection methods with significant improvement, demonstrating the effectiveness of our learned approach towards semi-structured pruning.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Pruning Methods
Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

ProxSparse
Sparse Mask Learning
Smoothed Optimization for Pruning
🔎 Similar Papers
No similar papers found.