HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address hallucination in retrieval-augmented generation (RAG) that undermines model trustworthiness, this paper proposes a lightweight, efficient, and interpretable hallucination detection method. We design a compact 4B-parameter reasoning model that directly classifies document-claim pairs and generates evidence-based natural language explanations. To mitigate annotation scarcity, we construct a domain-agnostic synthetic dataset derived from FineWeb and distill large-model reasoning capabilities via preference fine-tuning and odds-ratio optimization. Our method achieves 84.0% balanced accuracy on the RAGTruth subset—matching the performance of 7B–8B models—and 75.7% overall accuracy across the full benchmark, comparable to GPT-4o. The core contribution is the first hallucination detection paradigm that jointly achieves high efficiency, strong performance, and inherent interpretability, empirically validating the feasibility of small models for ensuring RAG reliability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) excel in many NLP tasks but remain prone to hallucinations, limiting trust in real-world applications. We present HalluGuard, a 4B-parameter Small Reasoning Model (SRM) for mitigating hallucinations in Retrieval-Augmented Generation (RAG). HalluGuard classifies document-claim pairs as grounded or hallucinated and produces evidence-grounded justifications for transparency. Our approach combines (i) a domain-agnostic synthetic dataset derived from FineWeb and refined through multi-stage curation and data reformation, (ii) synthetic grounded and hallucinated claims, and (iii) preference-based fine-tuning with Odds Ratio Preference Optimization to distill large-model reasoning into a smaller backbone. On the RAGTruth subset of the LLM-AggreFact benchmark, HalluGuard achieves 84.0% balanced accuracy (BAcc), rivaling specialized models, MiniCheck (7B; 84.0%) and Granite Guardian 3.3 (8B; 82.2%) while using roughly half their parameters. Over the full benchmark it reaches 75.7% BAcc, matching larger general-purpose LLMs such as GPT-4o (75.9%). We will release HalluGuard and datasets under Apache 2.0 upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucinations in retrieval-augmented generation models
Classifying document-claim pairs as grounded or hallucinated
Providing evidence-grounded justifications for transparency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Small Reasoning Model classifies document-claim pairs
Synthetic dataset with curated grounded and hallucinated claims
Preference-based fine-tuning distills large-model reasoning
🔎 Similar Papers
No similar papers found.