FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Social media parody detection faces challenges including strong contextual dependency, scarcity of bilingual data, and insufficient structural modeling. This paper introduces the first bilingual (Chinese-English), graph-structured benchmark dataset for parody detection—comprising 14,755 users and 21,210 comments—and proposes a novel three-level heterogeneous user-comment-reply graph to model sociolinguistic propagation mechanisms. We design a unified cross-lingual, multi-task evaluation framework integrating parody detection and sentiment analysis. Empirical results demonstrate that lightweight models (e.g., BERT+SVM) significantly outperform state-of-the-art large language models (GPT-4o, DeepSeek-R1), underscoring the critical role of structured contextual modeling in parody understanding. To foster reproducibility and advancement, we publicly release the dataset, source code, and evaluation protocols—thereby supporting research in cultural computing and AI robustness.

Technology Category

Application Category

📝 Abstract
Parody is an emerging phenomenon on social media, where individuals imitate a role or position opposite to their own, often for humor, provocation, or controversy. Detecting and analyzing parody can be challenging and is often reliant on context, yet it plays a crucial role in understanding cultural values, promoting subcultures, and enhancing self-expression. However, the study of parody is hindered by limited available data and deficient diversity in current datasets. To bridge this gap, we built seven parody datasets from both English and Chinese corpora, with 14,755 annotated users and 21,210 annotated comments in total. To provide sufficient context information, we also collect replies and construct user-interaction graphs to provide richer contextual information, which is lacking in existing datasets. With these datasets, we test traditional methods and Large Language Models (LLMs) on three key tasks: (1) parody detection, (2) comment sentiment analysis with parody, and (3) user sentiment analysis with parody. Our extensive experiments reveal that parody-related tasks still remain challenging for all models, and contextual information plays a critical role. Interestingly, we find that, in certain scenarios, traditional sentence embedding methods combined with simple classifiers can outperform advanced LLMs, i.e. DeepSeek-R1 and GPT-o3, highlighting parody as a significant challenge for LLMs.
Problem

Research questions and friction points this paper is trying to address.

Detecting parody in social media
Analyzing sentiment with parody
Improving dataset diversity for parody study
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual parody datasets creation
User-interaction graphs for context
Traditional methods outperform LLMs
🔎 Similar Papers
No similar papers found.
Y
Yilun Zheng
Nanyang Technological University, Centre for Info. Sciences and Systems
S
Sha Li
Nanyang Technological University, Centre for Info. Sciences and Systems
F
Fangkun Wu
Nanyang Technological University, Centre for Info. Sciences and Systems
Y
Yang Ziyi
Nanyang Technological University, Centre for Info. Sciences and Systems
L
Lin Hongchao
Nanyang Technological University, Centre for Info. Sciences and Systems
Z
Zhichao Hu
Nanyang Technological University, Centre for Info. Sciences and Systems
C
Cai Xinjun
Nanyang Technological University, Centre for Info. Sciences and Systems
Z
Ziming Wang
Nanyang Technological University, Centre for Info. Sciences and Systems
J
Jinxuan Chen
Nanyang Technological University, Centre for Info. Sciences and Systems
Sitao Luan
Sitao Luan
University of Montreal, Mila
Graph LearningAI4ScienceGraph for LLMLLM for GraphRL Reasoning
Jiahao Xu
Jiahao Xu
Nanyang Technological University
LLM Efficient ReasoningNMTAudio TranslationSentence Embeddings
L
Lihui Chen
Nanyang Technological University, Centre for Info. Sciences and Systems