Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech-based sarcasm detection faces dual challenges: data scarcity and suboptimal performance under purely audio-only modalities. To address these, we propose an LLM-driven collaborative annotation pipeline leveraging GPT-4o and LLaMA 3, followed by rigorous human verification, enabling the construction of PodSarc—the first large-scale, high-quality, speech-only sarcasm dataset (12,500+ expert-verified samples). We further introduce a novel cooperative gating architecture that fuses multi-LLM generated signals while integrating human expert validation to significantly enhance annotation reliability. Under the audio-only unimodal setting, our method achieves 73.63% F1 score, outperforming prior baselines by 12.4%. This work establishes the first reproducible large-scale benchmark for speech sarcasm detection and provides an effective, scalable technical framework grounded in human-in-the-loop LLM collaboration.

Technology Category

Application Category

📝 Abstract
Sarcasm fundamentally alters meaning through tone and context, yet detecting it in speech remains a challenge due to data scarcity. In addition, existing detection systems often rely on multimodal data, limiting their applicability in contexts where only speech is available. To address this, we propose an annotation pipeline that leverages large language models (LLMs) to generate a sarcasm dataset. Using a publicly available sarcasm-focused podcast, we employ GPT-4o and LLaMA 3 for initial sarcasm annotations, followed by human verification to resolve disagreements. We validate this approach by comparing annotation quality and detection performance on a publicly available sarcasm dataset using a collaborative gating architecture. Finally, we introduce PodSarc, a large-scale sarcastic speech dataset created through this pipeline. The detection model achieves a 73.63% F1 score, demonstrating the dataset's potential as a benchmark for sarcasm detection research.
Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in sarcasm detection from speech
Reducing reliance on multimodal data for sarcasm detection
Generating reliable sarcasm annotations using LLMs and human verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for sarcasm annotation
Human-verified GPT-4o and LLaMA 3 annotations
Creating PodSarc dataset via collaborative pipeline
🔎 Similar Papers
No similar papers found.