Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Addressing two key challenges in large language model (LLM) compression—difficult inter-layer rank selection and persistent parameter redundancy—this paper proposes a physics-inspired SVD-based low-rank compression framework. Methodologically: (1) It relaxes discrete truncation rank selection into a continuous optimization problem via the Fermi function and employs FermiGrad, a tailored gradient algorithm, to achieve globally optimal layer-wise rank allocation; (2) Leveraging gauge freedom in weight parameterization, it introduces PivGa—a pivot-based gauge-fixing technique—to eliminate redundancy losslessly. The work is the first to integrate statistical physics principles—specifically Fermi-Dirac statistics—into LLM compression optimization, bridging theoretical rigor with practical efficacy. Experiments demonstrate that the method substantially reduces parameter count and computational cost while preserving model performance, outperforming state-of-the-art baselines in compression efficiency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) extbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) extbf{PivGa}, an additional extit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.

Problem

Research questions and friction points this paper is trying to address.

Optimizes SVD compression for LLMs via physics-inspired rank selection

Addresses layer-wise rank selection challenges in LLM compression

Eliminates parameter redundancy in low-rank LLM decompositions

Innovation

Methods, ideas, or system contributions that make the work stand out.

FermiGrad uses Fermi function for continuous rank optimization

PivGa removes gauge redundancy in low-rank factors losslessly

Globally optimized SVD compression via physics-inspired improvements

🔎 Similar Papers

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization