A GAN and LLM-Driven Data Augmentation Framework for Dynamic Linguistic Pattern Modeling in Chinese Sarcasm Detection

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of Chinese sarcasm detection, which suffers from data scarcity and high annotation costs, as well as the neglect of individual users’ linguistic idiosyncrasies in existing approaches. To overcome these limitations, the work proposes a novel framework that integrates users’ long-term language behavior into sarcasm recognition by combining generative adversarial networks (GANs) with large language models such as GPT-3.5 for data augmentation. This approach yields SinaSarc, a multidimensional sarcasm corpus enriched with user history. Furthermore, the authors extend the BERT architecture to dynamically model personalized language styles. Experimental results demonstrate significant improvements over current state-of-the-art models, achieving F1 scores of 0.9151 and 0.9138 for sarcastic and non-sarcastic classes, respectively.
📝 Abstract
Sarcasm is a rhetorical device that expresses criticism or emphasizes characteristics of certain individuals or situations through exaggeration, irony, or comparison. Existing methods for Chinese sarcasm detection are constrained by limited datasets and high construction costs, and they mainly focus on textual features, overlooking user-specific linguistic patterns that shape how opinions and emotions are expressed. This paper proposes a Generative Adversarial Network (GAN) and Large Language Model (LLM)-driven data augmentation framework to dynamically model users' linguistic patterns for enhanced Chinese sarcasm detection. First, we collect raw data from various topics on Sina Weibo. Then, we train a GAN on these data and apply a GPT-3.5 based data augmentation technique to synthesize an extended sarcastic comment dataset, named SinaSarc. This dataset contains target comments, contextual information, and user historical behavior. Finally, we extend the BERT architecture to incorporate multi-dimensional information, particularly user historical behavior, enabling the model to capture dynamic linguistic patterns and uncover implicit sarcastic cues in comments. Experimental results demonstrate the effectiveness of our proposed method. Specifically, our model achieves the highest F1-scores on both the non-sarcastic and sarcastic categories, with values of 0.9138 and 0.9151 respectively, which outperforms all existing state-of-the-art (SOTA) approaches. This study presents a novel framework for dynamically modeling users' long-term linguistic patterns in Chinese sarcasm detection, contributing to both dataset construction and methodological advancement in this field.
Problem

Research questions and friction points this paper is trying to address.

Chinese sarcasm detection
data augmentation
linguistic patterns
user-specific behavior
limited datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

data augmentation
linguistic pattern modeling
Chinese sarcasm detection
GAN-LLM framework
user historical behavior
🔎 Similar Papers
No similar papers found.
W
Wenxian Wang
Key Laboratory of Data Protection and Intelligent Management, the Cyber Science Research Institute, Sichuan University, Chengdu 610207, China
X
Xiaohu Luo
School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China
Junfeng Hao
Junfeng Hao
广东医科大学附属医院 血液透析中心 主任医师
肾病 血液透析 血透通路
X
Xiaoming Gu
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China
Xingshu Chen
Xingshu Chen
Professor of Computer Science, Sichuan University
Cybersecurity
Z
Zhu Wang
Law School, Sichuan University, Chengdu 610207, China
Haizhou Wang
Haizhou Wang
School of Cyber Science and Engineering, Sichuan University
fake information detectionsocial network analysis