FactLens: Benchmarking Fine-Grained Fact Verification

📅 2024-11-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate factually incorrect statements, and conventional holistic fact verification approaches struggle to localize subtle factual errors. To address this, we propose a novel fine-grained fact verification paradigm: decomposing complex claims into semantically equivalent, independently verifiable sub-claims to enhance error localization accuracy and evidence retrieval transparency. Toward this end, we introduce FactLens—the first benchmark explicitly designed for fine-grained fact verification—comprising human-annotated data, sub-claim generation, and alignment modeling. We further design a multidimensional automated evaluation framework that jointly assesses faithfulness, completeness, and verifiability, ensuring both semantic fidelity and contextual consistency. Our FactLens evaluator achieves high agreement with human judgments (Spearman ρ > 0.85) and, for the first time, systematically reveals significant impacts of sub-claim characteristics—including length and abstraction level—on verification performance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown impressive capability in language generation and understanding, but their tendency to hallucinate and produce factually incorrect information remains a key limitation. To verify LLM-generated contents and claims from other sources, traditional verification approaches often rely on holistic models that assign a single factuality label to complex claims, potentially obscuring nuanced errors. In this paper, we advocate for a shift toward fine-grained verification, where complex claims are broken down into smaller sub-claims for individual verification, allowing for more precise identification of inaccuracies, improved transparency, and reduced ambiguity in evidence retrieval. However, generating sub-claims poses challenges, such as maintaining context and ensuring semantic equivalence with respect to the original claim. We introduce FactLens, a benchmark for evaluating fine-grained fact verification, with metrics and automated evaluators of sub-claim quality. The benchmark data is manually curated to ensure high-quality ground truth. Our results show alignment between automated FactLens evaluators and human judgments, and we discuss the impact of sub-claim characteristics on the overall verification performance.
Problem

Research questions and friction points this paper is trying to address.

Evaluating fine-grained fact verification in LLMs
Addressing challenges in sub-claim generation and verification
Benchmarking automated vs human judgment alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained sub-claim verification approach
Automated evaluators for sub-claim quality
Manually curated benchmark data
🔎 Similar Papers
No similar papers found.
K
Kushan Mitra
Megagon Labs
D
Dan Zhang
Megagon Labs
Sajjadur Rahman
Sajjadur Rahman
Senior Manager, AI/ML, Adobe
AgentsLarge Language ModelsHuman-centered AIDatabases
E
Estevam R. Hruschka
Megagon Labs