Evaluating Evidence Attribution in Generated Fact Checking Explanations

📅 2024-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Evidence misattribution—frequent in fact-checking explanation generation—leads to hallucinations and low credibility. To address this, we propose the Citation Masking and Recovery (CMR) evaluation protocol, the first quantifiable framework for assessing evidence attribution quality. Leveraging large language models (LLMs) for automated annotation, crowdsourced human evaluation, and controlled comparative experiments, we find that state-of-the-art LLMs still exhibit substantial attribution error rates. However, LLM-generated attributions align strongly with human annotations (Spearman ρ > 0.85), validating their efficacy as scalable proxies for human assessment. Crucially, our experiments demonstrate that human-curated evidence selection significantly improves both explanation accuracy and interpretability. This work establishes a novel, empirically grounded evaluation standard for trustworthy explanation generation in fact-checking systems.

Technology Category

Application Category

📝 Abstract
Automated fact-checking systems often struggle with trustworthiness, as their generated explanations can include hallucinations. In this work, we explore evidence attribution for fact-checking explanation generation. We introduce a novel evaluation protocol -- citation masking and recovery -- to assess attribution quality in generated explanations. We implement our protocol using both human annotators and automatic annotators, and find that LLM annotation correlates with human annotation, suggesting that attribution assessment can be automated. Finally, our experiments reveal that: (1) the best-performing LLMs still generate explanations with inaccurate attributions; and (2) human-curated evidence is essential for generating better explanations. Code and data are available here: https://github.com/ruixing76/Transparent-FCExp.
Problem

Research questions and friction points this paper is trying to address.

Evaluating evidence attribution in fact-checking explanations.
Assessing attribution quality using citation masking.
Identifying inaccuracies in LLM-generated explanations.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Citation masking and recovery
Automated attribution assessment
Human-curated evidence integration
🔎 Similar Papers
No similar papers found.