A Guide to Misinformation Detection Data and Evaluation

📅 2024-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Misinformation detection research has long suffered from low-quality datasets, pervasive label noise, and non-robust evaluation protocols—leading to inflated performance estimates and non-generalizable conclusions. To address this, we conduct a systematic audit of 75 misinformation detection datasets and introduce the “Evaluation Quality Assessment” (EQA) paradigm. EQA integrates data quality auditing, statistical bias analysis, and cross-dataset reproduction of state-of-the-art models to uncover three fundamental flaws: widespread indeterminate samples, systematic label distortion, and the inadequacy of discrete labels in capturing fine-grained model capabilities. We perform in-depth quality assessments on 36 claim-level and 9 paragraph-level datasets, curate the largest publicly available misinformation dataset compilation to date, and open-source the EQA toolkit alongside all evaluation resources. This work establishes foundational infrastructure for reproducible, comparable, and trustworthy empirical research in misinformation detection.

Technology Category

Application Category

📝 Abstract
Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of 36 datasets that consist of statements or claims, as well as the 9 datasets that consist of data in purely paragraph form. We assess these datasets to identify those with solid foundations for empirical work and those with flaws that could result in misleading and non-generalizable results, such as spurious correlations, or examples that are ambiguous or otherwise impossible to assess for veracity. We find the latter issue is particularly severe and affects most datasets in the literature. We further provide state-of-the-art baselines on all these datasets, but show that regardless of label quality, categorical labels may no longer give an accurate evaluation of detection model performance. Finally, we we propose and highlight Evaluation Quality Assessment (EQA) as a tool to guide the field toward systemic solutions rather than inadvertently propagating issues in evaluation. Overall, this guide aims to provide a roadmap for higher quality data and better grounded evaluations, ultimately improving research in misinformation detection. All datasets and other artifacts are available at misinfo-datasets.complexdatalab.com.
Problem

Research questions and friction points this paper is trying to address.

Addressing data deficiencies in misinformation detection research.
Evaluating dataset quality to avoid misleading and non-generalizable results.
Proposing Evaluation Quality Assessment for systemic evaluation improvements.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated largest misinformation datasets collection
Proposed Evaluation Quality Assessment tool
Provided state-of-the-art baselines for datasets
🔎 Similar Papers
No similar papers found.
C
Camille Thibault
Mila, Université de Montréal
Jacob-Junqi Tian
Jacob-Junqi Tian
McGill University
Natural Language ProcessingReinforcement Learning
G
Gabrielle Peloquin-Skulski
MIT
T
Taylor Lynn Curtis
J
James Zhou
F
Florence Laflamme
Mila, Université de Montréal
Y
Yuxiang Guan
Reihaneh Rabbany
Reihaneh Rabbany
Assistant Professor of Computer Science, McGill University; Canada CIFAR AI Chair, Mila
Data MiningMachine LearningGraph MiningNetwork ScienceComputational Social Science
J
Jean-Franccois Godbout
Mila, Université de Montréal
Kellin Pelrine
Kellin Pelrine
FAR.AI
AI SecurityAI Agents