π€ AI Summary
This paper addresses three key challenges in singular value decomposition (SVD)-based compression of large language models (LLMs): difficulty in determining optimal activation truncation positions, inefficient weight reconstruction post-truncation, and inherent information loss due to SVD. To this end, we propose Dobi-SVDβthe first differentiable SVD compression framework tailored for LLMs. Its core innovation lies in shifting from conventional weight-distance optimization to an **activation-oriented compression paradigm**. We introduce a **layer-adaptive activation truncation strategy** and a **gradient-aware weight reconstruction mechanism**, enabling end-to-end differentiability and training. Crucially, our design mitigates SVD-induced information injection distortion at the algorithmic level. Extensive evaluation on LLaMA-2 and LLaMA-3 demonstrates that Dobi-SVD achieves a 12% reduction in perplexity (PPL) at 4-bit equivalent precision, significantly outperforming state-of-the-art quantization and pruning baselines.
π Abstract
We provide a new LLM-compression solution via SVD, unlocking new possibilities for LLM compression beyond quantization and pruning. We point out that the optimal use of SVD lies in truncating activations, rather than merely using activations as an optimization distance. Building on this principle, we address three critical challenges in SVD-based LLM compression: including (1) How can we determine the optimal activation truncation position for each weight matrix in LLMs? (2) How can we efficiently reconstruct the weight matrices based on truncated activations? (3) How can we address the inherent"injection"nature that results in the information loss of the SVD? We propose Dobi-SVD, which establishes a new, principled approach to SVD-based LLM compression.