Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of estimating the prevalence of false or misleading information on social media, which is often confounded by multiple sources of uncertainty—including sampling variability, inter-annotator disagreement in manual labeling, and ambiguity in keyword-based retrieval. To improve the robustness of mitigation strategies, this work proposes a unified framework that jointly quantifies these three uncertainty sources by integrating multinomial simulation with Bootstrap resampling. The approach yields uncertainty-aware prevalence estimates evaluated on multi-platform, multilingual datasets annotated by professional fact-checkers. Empirical results demonstrate that uncertainty stemming from keyword retrieval can substantially exceed baseline variability, leading to markedly wider confidence intervals. These findings underscore the necessity and value of jointly modeling all major uncertainty sources to enhance the reliability and robustness of misinformation prevalence estimation.

Technology Category

Application Category

📝 Abstract
Estimation of mis/disinformation prevalence in social media is crucial for designing mitigation strategies to limit its impact. Yet, such estimations are subject to several uncertainties that are rarely quantified jointly. In this study, we present a methodological contribution in which confidence intervals were used to quantify uncertainties related to mis/disinformation prevalence. The analysis draws on a multi-platform, multilingual dataset annotated by professional fact-checkers. Data were collected between March and April 2025 from Facebook, Instagram, LinkedIn, TikTok, X/Twitter, and YouTube across four EU Member States (France, Poland, Slovakia, and Spain). We account for different causes of uncertainty: (i) sample uncertainty, (ii) annotation uncertainty arising from human disagreement and misclassification, and (iii) data retrieval uncertainty induced by keyword-based data collection. First, we estimate the uncertainty arising from the different causes separately using confidence intervals, simulation-based methods, and bootstrapping. Finally, we combined multinomial simulations of annotator behaviour with keyword and post-resampling to capture the joint impact of measurement uncertainty on mis/disinformation prevalence estimates. The proposed methodological approach highlights the importance of uncertainty-aware estimation of mis/disinformation prevalence for robust analysis. The empirical results of this study show that keyword-based data retrieval can exceed baseline variability, leading to wider confidence intervals around prevalence estimates.
Problem

Research questions and friction points this paper is trying to address.

misinformation
disinformation
uncertainty quantification
prevalence estimation
social media
Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty-aware estimation
mis/disinformation prevalence
confidence intervals
multinomial simulation
data retrieval bias
🔎 Similar Papers
No similar papers found.
I
Ishari Amarasinghe
Ethical Technologies and Connectivity for Humanity Research Centre, Universitat Oberta de Catalunya, Rambla del Poblenou 154-156, Barcelona, 08018, Spain; Department of Engineering, Universitat Pompeu Fabra, Tànger 122, Barcelona, 08018, Spain
S
Salvatore Romano
Ethical Technologies and Connectivity for Humanity Research Centre, Universitat Oberta de Catalunya, Rambla del Poblenou 154-156, Barcelona, 08018, Spain
Jacopo Amidei
Jacopo Amidei
Universitat Oberta Catalunya (UOC)
Evaluation of Natural Language Generation SystemsData Annotation and ReliabilityDialogue
E
Emmanuel M. Vincent
Science Feedback, 21 Place de la République, Paris, 75003, France
Andreas Kaltenbrunner
Andreas Kaltenbrunner
Universitat Oberta de Catalunya, Universitat Pompeu Fabra
Social NetworksComputational Social ScienceSocial MediaWikipediaUser Behaviour