đ¤ AI Summary
This study investigates how humorâparticularly irony, sarcasm, and absurdityâenhances the deceptive efficacy of misinformation and facilitates its cross-lingual propagation. To address this, we introduce DHD, the first multilingual synthetic benchmark dataset specifically designed for deceptive humor detection, covering six languages (e.g., English, Hindi, Telugu) and four code-mixed variants. Methodologically, we propose a novel three-level irony intensity scale and a five-category humor taxonomy, and pioneer a âlarge language model generation + multi-stage human verificationâ synthesis paradigm, integrated with a multilingual NLP pipeline for text cleaning, code-mixing identification, and annotation alignment. DHD comprises over 10,000 high-quality samples. Fine-tuned RoBERTa and XLM-R baselines achieve 78.3%â86.1% accuracy on irony detection and humor classificationâsignificantly outperforming zero-shot cross-lingual transferâthereby establishing a robust foundation for multilingual deceptive humor analysis.
đ Abstract
This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.