The Phenomenology of Hallucinations

πŸ“… 2026-03-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

159K/year
πŸ€– AI Summary
This work reveals that although language models internally encode uncertainty signals, these signals are weakly coupled to the output layer, preventing the model from abstaining and thereby generating hallucinations. For the first time, the study analyzes this limitation through the lens of representation geometry and topology, demonstrating that uncertainty manifests as fragmented structures in high-dimensional space, lacking a unified abstention attractor. The mechanism is validated across diverse model architectures using intrinsic dimension estimation, gradient and Fisher information probing, topological data analysis, and causal interventions. By directly injecting the internal uncertainty signal into the logits, the authors significantly restore the model’s ability to abstain, effectively suppressing hallucinations and offering a novel pathway toward reliable generation.

Technology Category

Application Category

πŸ“ Abstract
We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3$\times$ the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.
Problem

Research questions and friction points this paper is trying to address.

hallucination
uncertainty
language models
output generation
abstention
Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination
uncertainty integration
topological fragmentation
causal intervention
abstention mechanism
πŸ”Ž Similar Papers
No similar papers found.