Enhancing Rumor Detection Methods with Propagation Structure Infused Language Model

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pretrained language models (PLMs) underperform in social media rumor detection due to domain mismatch between pretraining corpora and informal social text, insufficient symbolic modeling, and lack of explicit modeling of user interactions within propagation structures. To address these limitations, we propose Posterior Engagement Prediction (PEP), a novel continual pretraining paradigm that, for the first time, formalizes the root–parent–branch relational structure of information diffusion graphs as a self-supervised objective—thereby enhancing PLMs’ sensitivity to stance evolution and affective interaction. Evaluated on large-scale datasets (TwitterCorpus, UTwitter, UWeibo), PEP consistently improves rumor detection accuracy by 1.0–3.7% across BERT and RoBERTa variants, maintaining robustness under few-shot settings. The resulting model, SoLM, achieves state-of-the-art performance without auxiliary modules, significantly improving interpretability and generalization in social-contextual reasoning.

Technology Category

Application Category

📝 Abstract
Pretrained Language Models (PLMs) have excelled in various Natural Language Processing tasks, benefiting from large-scale pretraining and self-attention mechanism's ability to capture long-range dependencies. However, their performance on social media application tasks like rumor detection remains suboptimal. We attribute this to mismatches between pretraining corpora and social texts, inadequate handling of unique social symbols, and pretraining tasks ill-suited for modeling user engagements implicit in propagation structures. To address these issues, we propose a continue pretraining strategy called Post Engagement Prediction (PEP) to infuse information from propagation structures into PLMs. PEP makes models to predict root, branch, and parent relations between posts, capturing interactions of stance and sentiment crucial for rumor detection. We also curate and release large-scale Twitter corpus: TwitterCorpus (269GB text), and two unlabeled claim conversation datasets with propagation structures (UTwitter and UWeibo). Utilizing these resources and PEP strategy, we train a Twitter-tailored PLM called SoLM. Extensive experiments demonstrate PEP significantly boosts rumor detection performance across universal and social media PLMs, even in few-shot scenarios. On benchmark datasets, PEP enhances baseline models by 1.0-3.7% accuracy, even enabling it to outperform current state-of-the-art methods on multiple datasets. SoLM alone, without high-level modules, also achieves competitive results, highlighting the strategy's effectiveness in learning discriminative post interaction features.
Problem

Research questions and friction points this paper is trying to address.

Improving rumor detection in social media texts
Addressing mismatches in pretraining corpora for social symbols
Enhancing PLMs with propagation structure information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post Engagement Prediction for propagation infusion
Twitter-tailored PLM named SoLM
Large-scale TwitterCorpus and datasets
🔎 Similar Papers
No similar papers found.
Chaoqun Cui
Chaoqun Cui
Institute of Automation, Chinese Academy of Sciences
Machine LearningNatural Language Processing
S
Siyuan Li
School of Computer Science and Technology & Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
K
Kunkun Ma
School of Computer Science and Technology & Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
C
Caiyan Jia
School of Computer Science and Technology & Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China