🤖 AI Summary
This study addresses the challenges of data imbalance and limited generalization in automated verifiable claim detection across multilingual, multidomain, and multi-style settings. To this end, the authors introduce MultiCW, a large-scale, strictly class-balanced benchmark dataset spanning 16 languages, 7 thematic domains, and 2 writing styles, comprising 123,722 training instances and 27,761 out-of-distribution test samples. The work presents the first systematic comparison between fine-tuned multilingual Transformer models and 15 zero-shot large language models. Results demonstrate that fine-tuned models significantly outperform zero-shot approaches on verifiability classification and exhibit strong cross-lingual, cross-domain, and cross-style generalization capabilities.
📝 Abstract
Large Language Models (LLMs) are beginning to reshape how media professionals verify information, yet automated support for detecting check-worthy claims a key step in the fact-checking process remains limited. We introduce the Multi-Check-Worthy (MultiCW) dataset, a balanced multilingual benchmark for check-worthy claim detection spanning 16 languages, 7 topical domains, and 2 writing styles. It consists of 123,722 samples, evenly distributed between noisy (informal) and structured (formal) texts, with balanced representation of check-worthy and non-check-worthy classes across all languages. To probe robustness, we also introduce an equally balanced out-of-distribution evaluation set of 27,761 samples in 4 additional languages. To provide baselines, we benchmark 3 common fine-tuned multilingual transformers against a diverse set of 15 commercial and open LLMs under zero-shot settings. Our findings show that fine-tuned models consistently outperform zero-shot LLMs on claim classification and show strong out-of-distribution generalization across languages, domains, and styles. MultiCW provides a rigorous multilingual resource for advancing automated fact-checking and enables systematic comparisons between fine-tuned models and cutting-edge LLMs on the check-worthy claim detection task.