🤖 AI Summary
This study investigates how humor—particularly irony, sarcasm, and absurdity—enhances the deceptive efficacy of misinformation and facilitates its cross-lingual propagation. To address this, we introduce DHD, the first multilingual synthetic benchmark dataset specifically designed for deceptive humor detection, covering six languages (e.g., English, Hindi, Telugu) and four code-mixed variants. Methodologically, we propose a novel three-level irony intensity scale and a five-category humor taxonomy, and pioneer a “large language model generation + multi-stage human verification” synthesis paradigm, integrated with a multilingual NLP pipeline for text cleaning, code-mixing identification, and annotation alignment. DHD comprises over 10,000 high-quality samples. Fine-tuned RoBERTa and XLM-R baselines achieve 78.3%–86.1% accuracy on irony detection and humor classification—significantly outperforming zero-shot cross-lingual transfer—thereby establishing a robust foundation for multilingual deceptive humor analysis.
📝 Abstract
This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.