🤖 AI Summary
Large language models (LLMs) frequently generate hallucinated outputs, and existing post-softmax uncertainty estimates—such as semantic entropy—fail to accurately reflect the model’s intrinsic confidence due to softmax-induced probability distortion. To address this, we propose Semantic Energy, a novel hallucination detection framework that introduces an energy-based formulation for semantic uncertainty modeling. Unlike conventional approaches, Semantic Energy operates directly on the penultimate-layer logits, bypassing softmax normalization entirely. It integrates semantic clustering with a Boltzmann energy distribution to jointly quantify response-level semantic consistency and diversity. Crucially, it avoids probability calibration constraints, enabling more sensitive detection of low-confidence hallucinations. Extensive evaluations across multiple benchmarks demonstrate that Semantic Energy consistently outperforms state-of-the-art entropy-based methods, delivering more robust, interpretable, and calibration-free uncertainty signals.
📝 Abstract
Large Language Models (LLMs) are being increasingly deployed in real-world applications, but they remain susceptible to hallucinations, which produce fluent yet incorrect responses and lead to erroneous decision-making. Uncertainty estimation is a feasible approach to detect such hallucinations. For example, semantic entropy estimates uncertainty by considering the semantic diversity across multiple sampled responses, thus identifying hallucinations. However, semantic entropy relies on post-softmax probabilities and fails to capture the model's inherent uncertainty, causing it to be ineffective in certain scenarios. To address this issue, we introduce Semantic Energy, a novel uncertainty estimation framework that leverages the inherent confidence of LLMs by operating directly on logits of penultimate layer. By combining semantic clustering with a Boltzmann-inspired energy distribution, our method better captures uncertainty in cases where semantic entropy fails. Experiments across multiple benchmarks show that Semantic Energy significantly improves hallucination detection and uncertainty estimation, offering more reliable signals for downstream applications such as hallucination detection.