Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Large language models (LLMs) frequently generate factually inaccurate content—so-called hallucinations—while existing mitigation approaches often rely on external knowledge sources. This paper proposes a knowledge-free, fine-grained cross-model consistency framework that detects and corrects hallucinations without external retrieval. It employs semantically equivalent prompts to elicit responses from multiple black-box LLMs, identifies erroneous spans by analyzing inter-model output discrepancies, and applies targeted revision to preserve correct content. Crucially, the method decouples hallucination detection and correction at the fine-grained semantic level, enabling high-precision, low-disturbance factual enhancement. On the FELM dataset, our approach improves hallucination detection F1-score by 6–39 percentage points; on the GPQA-diamond benchmark, it boosts answer accuracy by 7–8 percentage points for state-of-the-art models including Llama 4 Maverick and Claude 4 Sonnet. These results demonstrate both efficacy and strong generalization across diverse LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, but they remain susceptible to hallucinations--generating content that appears plausible but contains factual inaccuracies. We present Finch-Zk, a black-box framework that leverages FINe-grained Cross-model consistency to detect and mitigate Hallucinations in LLM outputs without requiring external knowledge sources. Finch-Zk introduces two key innovations: 1) a cross-model consistency checking strategy that reveals fine-grained inaccuracies by comparing responses generated by diverse models from semantically-equivalent prompts, and 2) a targeted mitigation technique that applies precise corrections to problematic segments while preserving accurate content. Experiments on the FELM dataset show Finch-Zk improves hallucination detection F1 scores by 6-39% compared to existing approaches. For mitigation, Finch-Zk achieves 7-8 absolute percentage points improvement in answer accuracy on the GPQA-diamond dataset when applied to state-of-the-art models like Llama 4 Maverick and Claude 4 Sonnet. Extensive evaluation across multiple models demonstrates that Finch-Zk provides a practical, deployment-ready safeguard for enhancing factual reliability in production LLM systems.

Problem

Research questions and friction points this paper is trying to address.

Detects factual inaccuracies in LLM outputs without external knowledge

Mitigates hallucinations through fine-grained cross-model consistency checking

Provides precise corrections to problematic segments while preserving accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-model consistency checking strategy

Targeted mitigation technique

Black-box framework without external knowledge

🔎 Similar Papers

AutoHall: Automated Hallucination Dataset Generation for Large Language Models