An Experimental Comparison of the Most Popular Approaches to Fake News Detection

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited generalization of existing fake news detection methods under domain shift and out-of-distribution scenarios, as well as the absence of a unified evaluation benchmark. For the first time, it systematically evaluates twelve representative approaches—including traditional machine learning models, deep neural networks, Transformers, specialized cross-domain architectures, and large language models—across ten English datasets under a consistent binary-label protocol, assessing in-domain, multi-domain, and cross-domain performance. The findings reveal that fine-tuned models excel in-domain but generalize poorly across domains; cross-domain architectures mitigate performance degradation yet rely heavily on abundant training data; and large language models demonstrate superior robustness and adaptability in zero-shot and few-shot settings. This work establishes a comprehensive benchmark and offers empirical insights into the generalization capabilities of fake news detection systems.

Technology Category

Application Category

📝 Abstract
In recent years, fake news detection has received increasing attention in public debate and scientific research. Despite advances in detection techniques, the production and spread of false information have become more sophisticated, driven by Large Language Models (LLMs) and the amplification power of social media. We present a critical assessment of 12 representative fake news detection approaches, spanning traditional machine learning, deep learning, transformers, and specialized cross-domain architectures. We evaluate these methods on 10 publicly available datasets differing in genre, source, topic, and labeling rationale. We address text-only English fake news detection as a binary classification task by harmonizing labels into "Real" and "Fake" to ensure a consistent evaluation protocol. We acknowledge that label semantics vary across datasets and that harmonization inevitably removes such semantic nuances. Each dataset is treated as a distinct domain. We conduct in-domain, multi-domain and cross-domain experiments to simulate real-world scenarios involving domain shift and out-of-distribution data. Fine-tuned models perform well in-domain but struggle to generalize. Cross-domain architectures can reduce this gap but are data-hungry, while LLMs offer a promising alternative through zero- and few-shot learning. Given inherent dataset confounds and possible pre-training exposure, results should be interpreted as robustness evaluations within this English, text-only protocol.
Problem

Research questions and friction points this paper is trying to address.

fake news detection
domain shift
cross-domain generalization
label harmonization
text classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

fake news detection
cross-domain generalization
large language models
zero-shot learning
systematic evaluation
🔎 Similar Papers
No similar papers found.
P
Pietro Dell'Oglio
Dipartimento di Ingegneria dell’Informazione, University of Pisa, Largo Lucio Lazzarino, 1, Pisa, Italy
A
Alessandro Bondielli
Dipartimento di Informatica, University of Pisa, Largo B. Pontecorvo, 3, Pisa, Italy
Francesco Marcelloni
Francesco Marcelloni
Professor of Data Mining and Machine Learning, University of Pisa, Circle U. Alliance
Artificial IntelligenceFederated LearningComputational IntelligenceBig Data MiningFuzzy
Lucia C. Passaro
Lucia C. Passaro
University of Pisa
Natural Language ProcessingComputational LinguisticsSemantics