π€ AI Summary
Large language models often exhibit high confidence in incorrect responses, yet existing unsupervised uncertainty estimation methods predominantly rely on token-level signals and overlook the geometric structure of hidden states. This work proposes GLU, a novel approach that revealsβ for the first timeβthat local uncertainty (token-level entropy) and global uncertainty (geometric entropy derived from hidden state manifolds) are approximately orthogonal and complementary, particularly effective in detecting "confident yet wrong" failure cases. By multiplicatively fusing these two sources, GLU yields a length-normalized, architecture-agnostic uncertainty score with only a single forward pass. Evaluated across three model families and six benchmarks, GLU consistently matches or outperforms all existing unsupervised baselines.
π Abstract
Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for reliable deployment. Existing methods rely predominantly on token-level signals, leaving the geometric structure of intermediate hidden states underused. In this paper, we take the geometric complexity of hidden-state matrices as a measure of the global uncertainty of LLMs, while treating token-level uncertainty estimation as a local metric. We show that hidden-state geometric entropy (global uncertainty) and token-level entropy (local uncertainty) are statistically near-orthogonal, capturing distinct failure regimes for reliability prediction. In particular, global geometry recovers the confident-but-wrong failure mode that local signals systematically miss. Building on this, we propose Global-Local Uncertainty (GLU), an unsupervised, single-pass score that fuses the two signals via a multiplicative gate. Across three model families and six benchmarks, GLU matches or outperforms all unsupervised baselines while requiring only a single forward pass and remaining length-normalized and architecture-agnostic.