MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection Models

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of data imbalance and limited generalization in automated verifiable claim detection across multilingual, multidomain, and multi-style settings. To this end, the authors introduce MultiCW, a large-scale, strictly class-balanced benchmark dataset spanning 16 languages, 7 thematic domains, and 2 writing styles, comprising 123,722 training instances and 27,761 out-of-distribution test samples. The work presents the first systematic comparison between fine-tuned multilingual Transformer models and 15 zero-shot large language models. Results demonstrate that fine-tuned models significantly outperform zero-shot approaches on verifiability classification and exhibit strong cross-lingual, cross-domain, and cross-style generalization capabilities.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are beginning to reshape how media professionals verify information, yet automated support for detecting check-worthy claims a key step in the fact-checking process remains limited. We introduce the Multi-Check-Worthy (MultiCW) dataset, a balanced multilingual benchmark for check-worthy claim detection spanning 16 languages, 7 topical domains, and 2 writing styles. It consists of 123,722 samples, evenly distributed between noisy (informal) and structured (formal) texts, with balanced representation of check-worthy and non-check-worthy classes across all languages. To probe robustness, we also introduce an equally balanced out-of-distribution evaluation set of 27,761 samples in 4 additional languages. To provide baselines, we benchmark 3 common fine-tuned multilingual transformers against a diverse set of 15 commercial and open LLMs under zero-shot settings. Our findings show that fine-tuned models consistently outperform zero-shot LLMs on claim classification and show strong out-of-distribution generalization across languages, domains, and styles. MultiCW provides a rigorous multilingual resource for advancing automated fact-checking and enables systematic comparisons between fine-tuned models and cutting-edge LLMs on the check-worthy claim detection task.
Problem

Research questions and friction points this paper is trying to address.

check-worthiness detection
fact-checking
multilingual benchmark
out-of-distribution generalization
automated claim verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

check-worthiness detection
multilingual benchmark
out-of-distribution generalization
fact-checking
large language models
🔎 Similar Papers
No similar papers found.
M
Martin Hyben
Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia
S
Sebastian Kula
Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia; West Pomeranian University of Technology in Szczecin, Szczecin, Poland
J
Jan Cegin
Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia
Jakub Simko
Jakub Simko
Expert researcher, Kempelen Institute of Intelligent Technologies
user modellingdata analysismachine learningcrowdsourcingeye-tracking
Ivan Srba
Ivan Srba
Kempelen Institute of Intelligent Technologies
AIMachine LearningNatural Language ProcessingSocial ComputingDisinformation
Robert Moro
Robert Moro
Senior Researcher at Kempelen Institute of Intelligent Technologies
Artificial IntelligenceMachine LearningUser ModelingPersonalizationEye Tracking