Verification with Transparency: The TrendFact Benchmark for Auditable Fact-Checking via Natural Language Explanation

📅 2024-10-19

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing fact-checking benchmarks suffer from three key limitations: insufficient explanatory annotations, English-centric bias, and inadequate temporal sensitivity—hindering the development of trustworthy verification systems. To address these, we introduce TrendFact, the first Chinese explainable fact-checking benchmark, comprising 7,643 multi-source (social media and professional outlets) claims supporting numerical, logical, and commonsense reasoning. We propose structured natural language explanations, a temporally aware evidence triangulation mechanism, and FactISR—an iterative, self-reflective verification framework. Additionally, we design the Explainability Consistency Score (ECS) for rigorous interpretability evaluation. Experiments reveal that state-of-the-art reasoning models (e.g., DeepSeek-R1, o1) exhibit significant performance degradation on this realistic, complex task. In contrast, FactISR substantially improves both verification accuracy and explanation quality, establishing a new benchmark for trustworthy, explainable fact-checking.

Technology Category

Application Category

📝 Abstract

While fact verification remains fundamental, explanation generation serves as a critical enabler for trustworthy fact-checking systems by producing interpretable rationales and facilitating comprehensive verification processes. However, current benchmarks exhibit critical limitations in three dimensions: (1) absence of explanatory annotations, (2) English-centric language bias, and (3) inadequate temporal relevance. To bridge these gaps, we present TrendFact, the first Chinese fact-checking benchmark incorporating structured natural language explanations. TrendFact comprises 7,643 carefully curated samples from trending social media content and professional fact-checking repositories, covering domains such as public health, political discourse, and economic claims. It supports various forms of reasoning, including numerical computation, logical reasoning, and common sense verification. The rigorous multistage construction process ensures high data quality and provides significant challenges. Furthermore, we propose the ECS to complement existing evaluation metrics. To establish effective baselines for TrendFact, we propose FactISR, a dual-component method integrating evidence triangulation and iterative self-reflection mechanism. Experimental results demonstrate that current leading reasoning models (e.g., DeepSeek-R1, o1) have significant limitations on TrendFact, underscoring the real-world challenges it presents. FactISR significantly enhances reasoning model performance, offering new insights for explainable and complex fact-checking.

Problem

Research questions and friction points this paper is trying to address.

Lack of explanatory annotations in fact-checking benchmarks

English-centric bias in current verification systems

Inadequate temporal relevance in existing fact-checking datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

First Chinese fact-checking benchmark with explanations

Dual-component method integrating evidence and self-reflection

ECS metric complements existing evaluation standards

🔎 Similar Papers

No similar papers found.

Authors to Follow