On Pruning State-Space LLMs

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

State-space model (SSM)-based large language models exhibit unexpected robustness against conventional weight pruning methods (e.g., WANDA), hindering effective structural compression. Method: We systematically evaluate the generalizability and robustness of diverse pruning strategies—including weight-based approaches (WANDA, magnitude pruning, SNIP), inter-layer sparsification, and task-adaptive retraining—across four representative SSM-LLMs. Contribution/Results: We uncover a previously unknown sensitivity pattern in SSM parameters, identifying the selective state matrix as the most vulnerable module. Building on this insight, we propose an SSM-aware pruning paradigm that achieves up to 50% parameter reduction with zero accuracy degradation across multiple downstream tasks—marking the first demonstration of lossless, high-ratio structured pruning for SSM-LLMs and resolving a key bottleneck in their lightweight deployment.

Technology Category

Application Category

📝 Abstract

Recent work proposed state-space models (SSMs) as an efficient alternative to transformer-based LLMs. Can these models be pruned to further reduce their computation costs? We adapt several pruning methods to the SSM structure, and apply them to four SSM-based LLMs across multiple tasks. We find that such models are quite robust to some pruning methods (e.g. WANDA), while using other methods lead to fast performance degradation.

Problem

Research questions and friction points this paper is trying to address.

Prune state-space LLMs efficiently

Reduce computation costs of SSMs

Evaluate pruning methods on SSM-based LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pruning state-space LLMs efficiently

Adapting pruning methods to SSMs

Evaluating robustness across pruning techniques

🔎 Similar Papers

No similar papers found.

Authors to Follow