CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the high memory and computational costs hindering large language model deployment by proposing CRePE, an efficient post-training pruning framework. CRePE introduces, for the first time, a two-dimensional local neighborhood context into relative importance scoring and enhances pruning accuracy through adaptive coefficients. To accelerate hyperparameter search, the authors design the PHO proxy optimization algorithm, which reduces search time from 11 hours to approximately 20 minutes while yielding hyperparameters that generalize well across different models. By integrating orthogonal techniques such as channel rearrangement and non-uniform sparsity allocation, CRePE consistently outperforms existing methods across various models and sparsity levels.
📝 Abstract
Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy. However, RIA considers only 1D cross-shaped (row/column) directional information and assigns equal weight to row and column contributions. In this paper, we propose \textbf{CRePE}, which incorporates 2D local neighborhood context and adaptive coefficients into Relative Importance scoring. CRePE consistently outperforms existing PTP methods across diverse models and sparsity settings. However, identifying optimal adaptive coefficients via perplexity (PPL)-based hill climbing requires numerous PPL evaluations and approximately 11 hours of search time. To address this, we propose \textbf{PHO} (Proxy-based Hyperparameter Optimization), which eliminates the need for repeated PPL measurements and reduces the search time to approximately 20 minutes. Furthermore, the optimal hyperparameter configuration found by PHO on one model transfers well to other models, demonstrating strong generalization. Finally, we verify that CRePE can be orthogonally combined with existing techniques including Channel Permutation, non-uniform sparsity allocation, and re-pruning methods.
Problem

Research questions and friction points this paper is trying to address.

Post-training pruning
Relative importance
Large Language Models
Hyperparameter optimization
Model compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

CRePE
post-training pruning
relative importance
PHO
LLM compression
🔎 Similar Papers
No similar papers found.