Zero-source LLM Hallucination Detection with Human-like Criteria Probing

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of evaluating the factual consistency of large language model (LLM) generations in a zero-source setting—where neither internal model information nor external references are available. The authors propose the Human-Centric Probing (HCP) mechanism, which mimics human evaluators’ multidimensional reasoning by decomposing truthfulness judgments into interpretable criteria and aggregating them with learned weights, using only question-answer pairs to detect hallucinations. Innovatively, they introduce a weakly supervised semantic consistency–based reward alignment framework that enables adaptive, interpretable, and external-knowledge-free hallucination detection. By integrating LLM agents, human-like probing, multi-sample aggregation, and semantic consistency alignment, the method significantly outperforms existing approaches across multiple benchmarks, achieving efficient, robust, and explainable zero-source hallucination detection.

📝 Abstract

Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source constraint, where no model internals or external references are available, and detection must rely solely on the textual query-answer pair. In this paper, we propose Human-like Criteria Probing for Hallucination Detection (HCPD), a paradigm that emulates the multi-faceted reasoning of human evaluators. Its core is a Human-like Criteria Probing (HCP) mechanism, in which a LLM agent adaptively decomposes its judgment into a weighted set of interpretable criteria and aggregates criterion-specific scores into a final truthfulness measure. To achieve this adaptive capability, we introduce a reward-based alignment scheme using only weak supervision from semantic consistency. At inference, we employ a multi-sampling aggregation strategy to ensure robust decisions while preserving full interpretability. We further provide theoretical analysis supporting the reliability of our approach. Extensive experiments show that HCPD consistently outperforms state-of-the-art baselines, offering an effective and explainable solution for zero-source hallucination detection. Code is available at https://github.com/TRISKEL10N/HCPD.

Problem

Research questions and friction points this paper is trying to address.

LLM hallucination

zero-source detection

truthfulness evaluation

factuality assessment

query-answer pair

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-source hallucination detection

human-like criteria probing

interpretable evaluation