🤖 AI Summary
To address the inefficiency of semantic uncertainty quantification in large language model (LLM) hallucination detection, this paper proposes a Bayesian inference–based algorithm for efficient semantic entropy estimation. Methodologically, it introduces the first adaptive importance sampling scheme that dynamically allocates samples according to contextual difficulty; theoretically, it proves that even a single-sample semantic entropy estimate retains discriminative power—breaking the conventional reliance on multiple samples. By modeling sequence-level semantic similarity in embedding space, the approach achieves high-accuracy uncertainty quantification under tight computational budgets. Experiments show that, at equal AUROC, our method requires only 59% of the LLM calls needed by Farquhar et al. (2024), substantially reducing detection overhead. Key contributions include: (i) the first provably single-sample–effective semantic entropy estimation framework; (ii) the first adaptive sampling strategy specifically designed for semantic entropy; and (iii) a lightweight hallucination detection paradigm that jointly optimizes accuracy and efficiency.
📝 Abstract
Detecting whether an LLM hallucinates is an important research challenge. One promising way of doing so is to estimate the semantic entropy (Farquhar et al., 2024) of the distribution of generated sequences. We propose a new algorithm for doing that, with two main advantages. First, due to us taking the Bayesian approach, we achieve a much better quality of semantic entropy estimates for a given budget of samples from the LLM. Second, we are able to tune the number of samples adaptively so that `harder' contexts receive more samples. We demonstrate empirically that our approach systematically beats the baselines, requiring only 59% of samples used by Farquhar et al. (2024) to achieve the same quality of hallucination detection as measured by AUROC. Moreover, quite counterintuitively, our estimator is useful even with just one sample from the LLM.