Can LLMs Detect Their Own Hallucinations?

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) possess intrinsic self-hallucination detection capability—i.e., the ability to identify factually incorrect statements they themselves generate. Method: We propose a Chain-of-Thought (CoT)-driven framework for self-consistency analysis and knowledge provenance, formalizing hallucination detection as a sentence-level binary classification task. Crucially, it explicitly extracts implicit knowledge encoded in model parameters to support self-verification—a first in hallucination research. Contribution/Results: Experiments show that GPT-3.5 Turbo achieves 58.2% accuracy in detecting its own hallucinations under this framework, substantially outperforming non-reasoning baselines. This demonstrates that LLMs exhibit non-negligible intrinsic self-correction capacity. Our key contributions are: (1) the first CoT-based classification paradigm specifically designed for self-hallucination detection; (2) empirical validation that model-internal implicit knowledge is effective for factual self-assessment; and (3) a novel pathway toward enhancing LLM reliability through parameter-intrinsic verification mechanisms.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs'capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT-$3.5$ Turbo with CoT detected $58.2%$ of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.
Problem

Research questions and friction points this paper is trying to address.

Investigating whether LLMs can detect their own factual hallucinations
Proposing a classification framework using Chain-of-Thought for detection
Evaluating LLMs' capability to identify hallucinations using internal knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Chain-of-Thought for hallucination classification
Extracts knowledge from model parameters directly
Formulates detection as sentence classification task
🔎 Similar Papers
No similar papers found.