The Phenomenology of Hallucinations

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work reveals that although language models internally encode uncertainty signals, these signals are weakly coupled to the output layer, preventing the model from abstaining and thereby generating hallucinations. For the first time, the study analyzes this limitation through the lens of representation geometry and topology, demonstrating that uncertainty manifests as fragmented structures in high-dimensional space, lacking a unified abstention attractor. The mechanism is validated across diverse model architectures using intrinsic dimension estimation, gradient and Fisher information probing, topological data analysis, and causal interventions. By directly injecting the internal uncertainty signal into the logits, the authors significantly restore the model’s ability to abstain, effectively suppressing hallucinations and offering a novel pathway toward reliable generation.

Technology Category

Application Category

📝 Abstract

We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3$\times$ the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.

Problem

Research questions and friction points this paper is trying to address.

hallucination

uncertainty

language models

output generation

abstention

Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination

uncertainty integration

topological fragmentation