Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates whether the self-attention mechanism in large language models (LLMs) can be rigorously modeled as a two-body spin–bath system to explain repetitive token generation and output biases—phenomena prevalent in models such as GPT-2. Method: We construct an effective Hamiltonian from the query–key weight matrix, analytically derive its phase-transition boundary, and use it to predict next-token logit distributions. We introduce the *logit gap*—a quantitative measure of phase-transition strength—and validate its statistical and causal role through ablation studies. Contribution/Results: Across 144 attention heads, the logit gap exhibits a strong negative correlation with empirical token ranking (Pearson’s *r* ≈ −0.70). Ablation experiments confirm its causal influence on generation behavior. This is the first empirical validation of the spin–bath analogy in production-scale LLMs, uncovering a deep correspondence between attention-head dynamics and statistical-mechanical phase transitions—thereby establishing a theoretical foundation for physics-informed interpretability and generative model design.

Technology Category

Application Category

📝 Abstract

The recently proposed physics-based framework by Huo and Johnson~cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians we obtain analytic extit{phase boundaries} logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings ($rapprox-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. This validation not only furnishes a tractable, physics-inspired lens for interpretability but also provides the groundwork for novel generative models, bridging the gap between theoretical condensed matter physics and AI.

Problem

Research questions and friction points this paper is trying to address.

Modeling attention in LLMs as a spin system

Deriving effective Hamiltonians from GPT-2 attention heads

Validating spin-bath analogy with empirical evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model attention as two-body spin system

Derive Hamiltonians from GPT-2 matrices

Validate spin-bath analogy empirically

🔎 Similar Papers

No similar papers found.

Authors to Follow