🤖 AI Summary
This paper addresses the critical problem of inaccurate and incomplete claim extraction from long texts generated by large language models (LLMs), which undermines downstream fact-checking reliability. To tackle this, we propose the first standardized evaluation framework for claim extraction. Methodologically, we design Claimify—a high-confidence-driven, fuzziness-aware extraction model that generates claims only when semantic clarity is assured—and develop an automated evaluation pipeline incorporating two novel quantitative metrics: coverage and decontextualization, complemented by semantic consistency verification for reproducible assessment. Our contributions are threefold: (1) an open-source, unified evaluation framework enabling fair cross-method comparison; (2) Claimify’s statistically significant outperformance over state-of-the-art baselines across multiple metrics; and (3) empirical evidence demonstrating that high-confidence constraints substantially improve claim verifiability and fact-checking accuracy.
📝 Abstract
A common strategy for fact-checking long-form content generated by Large Language Models (LLMs) is extracting simple claims that can be verified independently. Since inaccurate or incomplete claims compromise fact-checking results, ensuring claim quality is critical. However, the lack of a standardized evaluation framework impedes assessment and comparison of claim extraction methods. To address this gap, we propose a framework for evaluating claim extraction in the context of fact-checking along with automated, scalable, and replicable methods for applying this framework, including novel approaches for measuring coverage and decontextualization. We also introduce Claimify, an LLM-based claim extraction method, and demonstrate that it outperforms existing methods under our evaluation framework. A key feature of Claimify is its ability to handle ambiguity and extract claims only when there is high confidence in the correct interpretation of the source text.