FactLens: Benchmarking Fine-Grained Fact Verification

📅 2024-11-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large language models (LLMs) frequently generate factually incorrect statements, and conventional holistic fact verification approaches struggle to localize subtle factual errors. To address this, we propose a novel fine-grained fact verification paradigm: decomposing complex claims into semantically equivalent, independently verifiable sub-claims to enhance error localization accuracy and evidence retrieval transparency. Toward this end, we introduce FactLens—the first benchmark explicitly designed for fine-grained fact verification—comprising human-annotated data, sub-claim generation, and alignment modeling. We further design a multidimensional automated evaluation framework that jointly assesses faithfulness, completeness, and verifiability, ensuring both semantic fidelity and contextual consistency. Our FactLens evaluator achieves high agreement with human judgments (Spearman ρ > 0.85) and, for the first time, systematically reveals significant impacts of sub-claim characteristics—including length and abstraction level—on verification performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown impressive capability in language generation and understanding, but their tendency to hallucinate and produce factually incorrect information remains a key limitation. To verify LLM-generated contents and claims from other sources, traditional verification approaches often rely on holistic models that assign a single factuality label to complex claims, potentially obscuring nuanced errors. In this paper, we advocate for a shift toward fine-grained verification, where complex claims are broken down into smaller sub-claims for individual verification, allowing for more precise identification of inaccuracies, improved transparency, and reduced ambiguity in evidence retrieval. However, generating sub-claims poses challenges, such as maintaining context and ensuring semantic equivalence with respect to the original claim. We introduce FactLens, a benchmark for evaluating fine-grained fact verification, with metrics and automated evaluators of sub-claim quality. The benchmark data is manually curated to ensure high-quality ground truth. Our results show alignment between automated FactLens evaluators and human judgments, and we discuss the impact of sub-claim characteristics on the overall verification performance.

Problem

Research questions and friction points this paper is trying to address.

Evaluating fine-grained fact verification in LLMs

Addressing challenges in sub-claim generation and verification

Benchmarking automated vs human judgment alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained sub-claim verification approach

Automated evaluators for sub-claim quality

Manually curated benchmark data

🔎 Similar Papers

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models