Can LLMs Detect Their Own Hallucinations?

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) possess intrinsic self-hallucination detection capability—i.e., the ability to identify factually incorrect statements they themselves generate. Method: We propose a Chain-of-Thought (CoT)-driven framework for self-consistency analysis and knowledge provenance, formalizing hallucination detection as a sentence-level binary classification task. Crucially, it explicitly extracts implicit knowledge encoded in model parameters to support self-verification—a first in hallucination research. Contribution/Results: Experiments show that GPT-3.5 Turbo achieves 58.2% accuracy in detecting its own hallucinations under this framework, substantially outperforming non-reasoning baselines. This demonstrates that LLMs exhibit non-negligible intrinsic self-correction capacity. Our key contributions are: (1) the first CoT-based classification paradigm specifically designed for self-hallucination detection; (2) empirical validation that model-internal implicit knowledge is effective for factual self-assessment; and (3) a novel pathway toward enhancing LLM reliability through parameter-intrinsic verification mechanisms.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs'capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT-$3.5$ Turbo with CoT detected $58.2%$ of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.

Problem

Research questions and friction points this paper is trying to address.

Investigating whether LLMs can detect their own factual hallucinations

Proposing a classification framework using Chain-of-Thought for detection

Evaluating LLMs' capability to identify hallucinations using internal knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Chain-of-Thought for hallucination classification

Extracts knowledge from model parameters directly

Formulates detection as sentence classification task

🔎 Similar Papers

No similar papers found.