Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the core challenges of large language models—namely, reliability issues such as hallucination and fragile generalization under distributional shift, alongside high computational and energy costs—by proposing a unified analytical framework grounded in spectral geometry and random matrix theory. By characterizing the eigenvalue spectrum dynamics of hidden activations, the framework distinguishes structured representations from noise. Key contributions include EigenTrack, which leverages the Marchenko–Pastur distribution as a reference and a lightweight recurrent classifier to enable real-time detection of hallucinations and out-of-distribution behavior, and RMT-KD, a spectrally guided iterative self-distillation method that achieves hardware-friendly model compression. The approach significantly improves energy efficiency and compression ratios while preserving dense model architecture and accuracy, and further enables early warning of model failure.

Technology Category

Application Category

📝 Abstract

This thesis addresses two persistent and closely related challenges in modern deep learning, reliability and efficiency, through a unified framework grounded in Spectral Geometry and Random Matrix Theory (RMT). As deep networks and large language models continue to scale, their internal behavior becomes increasingly opaque, leading to hallucinations, fragile generalization under distribution shift, and growing computational and energy demands. By analyzing the eigenvalue dynamics of hidden activations across layers and inputs, this work shows that spectral statistics provide a compact, stable, and interpretable lens on model behavior, capable of separating structured, causal representations from noise-dominated variability. Within this framework, the first contribution, EigenTrack, introduces a real-time method for detecting hallucinations and out-of-distribution behavior in large language and vision-language models. EigenTrack transforms streaming activations into spectral descriptors such as entropy, variance, and deviations from the Marchenko-Pastur baseline, and models their temporal evolution using lightweight recurrent classifiers, enabling early detection of reliability failures before they appear in model outputs while offering interpretable insight into representation dynamics. The second contribution, RMT-KD, presents a principled approach to compressing deep networks via random matrix theoretic knowledge distillation. By interpreting outlier eigenvalues in activation spectra as carriers of task-relevant information, RMT-KD progressively projects networks onto lower-dimensional subspaces through iterative self-distillation, yielding significantly more compact and energy-efficient models while preserving accuracy and dense, hardware-friendly structure.

Problem

Research questions and friction points this paper is trying to address.

reliability

efficiency

large language models

hallucinations

distribution shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random Matrix Theory

Spectral Analysis

Knowledge Distillation