Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dialogue emotion recognition (ERC) suffers from data scarcity, significant source distribution bias, and high subjectivity in soft-label annotations; consequently, existing large language models (LLMs) remain underexploited—particularly for synthetic data generation. This paper introduces the first ERC-specific dialogue data synthesis framework leveraging lightweight general-purpose LLMs and prompt engineering, yielding six high-quality, diversity-enhanced synthetic datasets that effectively alleviate data insufficiency and label long-tail distributions. Empirical evaluation demonstrates statistically significant improvements (p < 0.01) in classification accuracy and robustness of mainstream ERC models across multiple benchmarks when trained on the synthesized data. Moreover, the data enables interpretable analysis of label imbalance mechanisms. Key contributions include: (i) empirical validation of small-scale LLMs’ efficacy for ERC data generation; (ii) the first controllable, task-aware synthetic paradigm tailored to ERC; and (iii) the release of the first open-source, ERC-oriented synthetic dataset collection.

Technology Category

Application Category

📝 Abstract
Emotion recognition in conversations (ERC) focuses on identifying emotion shifts within interactions, representing a significant step toward advancing machine intelligence. However, ERC data remains scarce, and existing datasets face numerous challenges due to their highly biased sources and the inherent subjectivity of soft labels. Even though Large Language Models (LLMs) have demonstrated their quality in many affective tasks, they are typically expensive to train, and their application to ERC tasks--particularly in data generation--remains limited. To address these challenges, we employ a small, resource-efficient, and general-purpose LLM to synthesize ERC datasets with diverse properties, supplementing the three most widely used ERC benchmarks. We generate six novel datasets, with two tailored to enhance each benchmark. We evaluate the utility of these datasets to (1) supplement existing datasets for ERC classification, and (2) analyze the effects of label imbalance in ERC. Our experimental results indicate that ERC classifier models trained on the generated datasets exhibit strong robustness and consistently achieve statistically significant performance improvements on existing ERC benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Address scarcity of ERC datasets with biased sources
Explore LLMs for cost-effective ERC data generation
Evaluate generated datasets for ERC classification robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Use small LLM to synthesize diverse ERC datasets
Generate datasets to supplement existing ERC benchmarks
Enhance ERC classification with label imbalance analysis
Burak Can Kaplan
Burak Can Kaplan
University of Hamburg
Artificial Intelligence
H
Hugo Cesar De Castro Carneiro
Department of Informatics, University of Hamburg, Hamburg 22527, Germany
S
Stefan Wermter
Department of Informatics, University of Hamburg, Hamburg 22527, Germany