Verification with Transparency: The TrendFact Benchmark for Auditable Fact-Checking via Natural Language Explanation

📅 2024-10-19
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing fact-checking benchmarks suffer from three key limitations: insufficient explanatory annotations, English-centric bias, and inadequate temporal sensitivity—hindering the development of trustworthy verification systems. To address these, we introduce TrendFact, the first Chinese explainable fact-checking benchmark, comprising 7,643 multi-source (social media and professional outlets) claims supporting numerical, logical, and commonsense reasoning. We propose structured natural language explanations, a temporally aware evidence triangulation mechanism, and FactISR—an iterative, self-reflective verification framework. Additionally, we design the Explainability Consistency Score (ECS) for rigorous interpretability evaluation. Experiments reveal that state-of-the-art reasoning models (e.g., DeepSeek-R1, o1) exhibit significant performance degradation on this realistic, complex task. In contrast, FactISR substantially improves both verification accuracy and explanation quality, establishing a new benchmark for trustworthy, explainable fact-checking.

Technology Category

Application Category

📝 Abstract
While fact verification remains fundamental, explanation generation serves as a critical enabler for trustworthy fact-checking systems by producing interpretable rationales and facilitating comprehensive verification processes. However, current benchmarks exhibit critical limitations in three dimensions: (1) absence of explanatory annotations, (2) English-centric language bias, and (3) inadequate temporal relevance. To bridge these gaps, we present TrendFact, the first Chinese fact-checking benchmark incorporating structured natural language explanations. TrendFact comprises 7,643 carefully curated samples from trending social media content and professional fact-checking repositories, covering domains such as public health, political discourse, and economic claims. It supports various forms of reasoning, including numerical computation, logical reasoning, and common sense verification. The rigorous multistage construction process ensures high data quality and provides significant challenges. Furthermore, we propose the ECS to complement existing evaluation metrics. To establish effective baselines for TrendFact, we propose FactISR, a dual-component method integrating evidence triangulation and iterative self-reflection mechanism. Experimental results demonstrate that current leading reasoning models (e.g., DeepSeek-R1, o1) have significant limitations on TrendFact, underscoring the real-world challenges it presents. FactISR significantly enhances reasoning model performance, offering new insights for explainable and complex fact-checking.
Problem

Research questions and friction points this paper is trying to address.

Lack of explanatory annotations in fact-checking benchmarks
English-centric bias in current verification systems
Inadequate temporal relevance in existing fact-checking datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

First Chinese fact-checking benchmark with explanations
Dual-component method integrating evidence and self-reflection
ECS metric complements existing evaluation standards
🔎 Similar Papers
No similar papers found.
X
Xiaocheng Zhang
Faculty of Computing, Harbin Institute of Technology
X
Xi Wang
National University of Defense Technology
Yifei Lu
Yifei Lu
Northeastern University, Shenyang, China
Large Language Model
Z
Zhuangzhuang Ye
Faculty of Computing, Harbin Institute of Technology
J
Jianing Wang
MeiTuan
M
Mengjiao Bao
MeiTuan
Peng Yan
Peng Yan
Research Assistant of ZHAW, PhD student of UZH
Deep LearningTransfer LearningIntelligent Algorithm
X
Xiaohong Su
Faculty of Computing, Harbin Institute of Technology