Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing two key challenges in large language model (LLM) compression—difficult inter-layer rank selection and persistent parameter redundancy—this paper proposes a physics-inspired SVD-based low-rank compression framework. Methodologically: (1) It relaxes discrete truncation rank selection into a continuous optimization problem via the Fermi function and employs FermiGrad, a tailored gradient algorithm, to achieve globally optimal layer-wise rank allocation; (2) Leveraging gauge freedom in weight parameterization, it introduces PivGa—a pivot-based gauge-fixing technique—to eliminate redundancy losslessly. The work is the first to integrate statistical physics principles—specifically Fermi-Dirac statistics—into LLM compression optimization, bridging theoretical rigor with practical efficacy. Experiments demonstrate that the method substantially reduces parameter count and computational cost while preserving model performance, outperforming state-of-the-art baselines in compression efficiency.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) extbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) extbf{PivGa}, an additional extit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.
Problem

Research questions and friction points this paper is trying to address.

Optimizes SVD compression for LLMs via physics-inspired rank selection
Addresses layer-wise rank selection challenges in LLM compression
Eliminates parameter redundancy in low-rank LLM decompositions
Innovation

Methods, ideas, or system contributions that make the work stand out.

FermiGrad uses Fermi function for continuous rank optimization
PivGa removes gauge redundancy in low-rank factors losslessly
Globally optimized SVD compression via physics-inspired improvements
R
Roman Rausch
Multiverse Computing, Paseo de Miramón 170, Planta 2, 20014 Donostia, Spain
David Jansen
David Jansen
ICFO
S
Sukhbinder Singh
Multiverse Computing, 192 Spadina Avenue, Toronto, ON M5T 2C2, Canada
R
Román Orús
Multiverse Computing, Paseo de Miramón 170, Planta 2, 20014 Donostia, Spain; Donostia International Physics Center, San Sebastián, Spain; Ikerbasque Foundation for Science, Bilbao, Spain