🤖 AI Summary
State-space model (SSM)-based large language models exhibit unexpected robustness against conventional weight pruning methods (e.g., WANDA), hindering effective structural compression.
Method: We systematically evaluate the generalizability and robustness of diverse pruning strategies—including weight-based approaches (WANDA, magnitude pruning, SNIP), inter-layer sparsification, and task-adaptive retraining—across four representative SSM-LLMs.
Contribution/Results: We uncover a previously unknown sensitivity pattern in SSM parameters, identifying the selective state matrix as the most vulnerable module. Building on this insight, we propose an SSM-aware pruning paradigm that achieves up to 50% parameter reduction with zero accuracy degradation across multiple downstream tasks—marking the first demonstration of lossless, high-ratio structured pruning for SSM-LLMs and resolving a key bottleneck in their lightweight deployment.
📝 Abstract
Recent work proposed state-space models (SSMs) as an efficient alternative to transformer-based LLMs. Can these models be pruned to further reduce their computation costs? We adapt several pruning methods to the SSM structure, and apply them to four SSM-based LLMs across multiple tasks. We find that such models are quite robust to some pruning methods (e.g. WANDA), while using other methods lead to fast performance degradation.