🤖 AI Summary
This work investigates whether the self-attention mechanism in large language models (LLMs) can be rigorously modeled as a two-body spin–bath system to explain repetitive token generation and output biases—phenomena prevalent in models such as GPT-2.
Method: We construct an effective Hamiltonian from the query–key weight matrix, analytically derive its phase-transition boundary, and use it to predict next-token logit distributions. We introduce the *logit gap*—a quantitative measure of phase-transition strength—and validate its statistical and causal role through ablation studies.
Contribution/Results: Across 144 attention heads, the logit gap exhibits a strong negative correlation with empirical token ranking (Pearson’s *r* ≈ −0.70). Ablation experiments confirm its causal influence on generation behavior. This is the first empirical validation of the spin–bath analogy in production-scale LLMs, uncovering a deep correspondence between attention-head dynamics and statistical-mechanical phase transitions—thereby establishing a theoretical foundation for physics-informed interpretability and generative model design.
📝 Abstract
The recently proposed physics-based framework by Huo and Johnson~cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians we obtain analytic extit{phase boundaries} logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings ($rapprox-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. This validation not only furnishes a tractable, physics-inspired lens for interpretability but also provides the groundwork for novel generative models, bridging the gap between theoretical condensed matter physics and AI.