Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) are susceptible to factual hallucinations induced by erroneous information in training data, undermining their reliability. Method: This paper systematically surveys factuality evaluation methodologies, addressing three core challenges: hallucination detection, limitations of existing benchmark datasets, and the reliability of evaluation metrics. We formulate five key research questions and propose a domain-customized fact-checking framework integrating instruction tuning, retrieval-augmented generation (RAG), multi-agent reasoning, and external knowledge integration. Enhanced interpretability and output consistency are achieved via advanced prompting strategies and domain-specific fine-tuning. Contribution/Results: Empirical results demonstrate that evidence-aligned evaluation—leveraging external verifiable sources—significantly outperforms purely autoregressive metrics in hallucination mitigation. The proposed framework advances the development of high-fidelity, context-aware, and domain-adapted trustworthy language models.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are trained on vast and diverse internet corpora that often include inaccurate or misleading content. Consequently, LLMs can generate misinformation, making robust fact-checking essential. This review systematically analyzes how LLM-generated content is evaluated for factual accuracy by exploring key challenges such as hallucinations, dataset limitations, and the reliability of evaluation metrics. The review emphasizes the need for strong fact-checking frameworks that integrate advanced prompting strategies, domain-specific fine-tuning, and retrieval-augmented generation (RAG) methods. It proposes five research questions that guide the analysis of the recent literature from 2020 to 2025, focusing on evaluation methods and mitigation techniques. The review also discusses the role of instruction tuning, multi-agent reasoning, and external knowledge access via RAG frameworks. Key findings highlight the limitations of current metrics, the value of grounding outputs with validated external evidence, and the importance of domain-specific customization to improve factual consistency. Overall, the review underlines the importance of building LLMs that are not only accurate and explainable but also tailored for domain-specific fact-checking. These insights contribute to the advancement of research toward more trustworthy and context-aware language models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating factual accuracy in LLM-generated content
Addressing hallucinations and dataset limitations in LLMs
Developing robust fact-checking frameworks for LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Advanced prompting strategies for fact-checking
Domain-specific fine-tuning for accuracy
Retrieval-augmented generation (RAG) methods
🔎 Similar Papers
No similar papers found.
S
Subhey Sadi Rahman
Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
M
Md. Adnanul Islam
Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
M
Md. Mahbub Alam
Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
M
Musarrat Zeba
Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
M
Md. Abdur Rahman
Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
Sadia Sultana Chowa
Sadia Sultana Chowa
Faculty of Science and Technology, Charles Darwin University, Casuarina, 0909, NT, Australia
Computer VisionMachine LearningLarge Language Model
Mohaimenul Azam Khan Raiaan
Mohaimenul Azam Khan Raiaan
PhD Student, Monash University
Computer VisionExplainable AIArtificial LifeLarge Language Model
S
Sami Azam
Faculty of Science and Technology, Charles Darwin University, Casuarina, NT 0909, Australia