🤖 AI Summary
This study addresses the limited generalization of existing fake news detection methods under domain shift and out-of-distribution scenarios, as well as the absence of a unified evaluation benchmark. For the first time, it systematically evaluates twelve representative approaches—including traditional machine learning models, deep neural networks, Transformers, specialized cross-domain architectures, and large language models—across ten English datasets under a consistent binary-label protocol, assessing in-domain, multi-domain, and cross-domain performance. The findings reveal that fine-tuned models excel in-domain but generalize poorly across domains; cross-domain architectures mitigate performance degradation yet rely heavily on abundant training data; and large language models demonstrate superior robustness and adaptability in zero-shot and few-shot settings. This work establishes a comprehensive benchmark and offers empirical insights into the generalization capabilities of fake news detection systems.
📝 Abstract
In recent years, fake news detection has received increasing attention in public debate and scientific research. Despite advances in detection techniques, the production and spread of false information have become more sophisticated, driven by Large Language Models (LLMs) and the amplification power of social media. We present a critical assessment of 12 representative fake news detection approaches, spanning traditional machine learning, deep learning, transformers, and specialized cross-domain architectures. We evaluate these methods on 10 publicly available datasets differing in genre, source, topic, and labeling rationale. We address text-only English fake news detection as a binary classification task by harmonizing labels into "Real" and "Fake" to ensure a consistent evaluation protocol. We acknowledge that label semantics vary across datasets and that harmonization inevitably removes such semantic nuances. Each dataset is treated as a distinct domain. We conduct in-domain, multi-domain and cross-domain experiments to simulate real-world scenarios involving domain shift and out-of-distribution data. Fine-tuned models perform well in-domain but struggle to generalize. Cross-domain architectures can reduce this gap but are data-hungry, while LLMs offer a promising alternative through zero- and few-shot learning. Given inherent dataset confounds and possible pre-training exposure, results should be interpreted as robustness evaluations within this English, text-only protocol.