🤖 AI Summary
Addressing two key challenges in large language model (LLM) compression—difficult inter-layer rank selection and persistent parameter redundancy—this paper proposes a physics-inspired SVD-based low-rank compression framework. Methodologically: (1) It relaxes discrete truncation rank selection into a continuous optimization problem via the Fermi function and employs FermiGrad, a tailored gradient algorithm, to achieve globally optimal layer-wise rank allocation; (2) Leveraging gauge freedom in weight parameterization, it introduces PivGa—a pivot-based gauge-fixing technique—to eliminate redundancy losslessly. The work is the first to integrate statistical physics principles—specifically Fermi-Dirac statistics—into LLM compression optimization, bridging theoretical rigor with practical efficacy. Experiments demonstrate that the method substantially reduces parameter count and computational cost while preserving model performance, outperforming state-of-the-art baselines in compression efficiency.
📝 Abstract
Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) extbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) extbf{PivGa}, an additional extit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.