Backtranslation and paraphrasing in the LLM era? Comparing data augmentation methods for emotion classification

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address data scarcity and class imbalance in sentiment classification, this paper systematically evaluates three large language model (LLM)-based data augmentation paradigms—back-translation, rewriting, and zero-/few-shot generation—using GPT-series models. Experimental results demonstrate that LLM-driven back-translation and rewriting consistently outperform purely generative augmentation across most settings, achieving performance on par with or exceeding that of few-shot generation while incurring lower computational cost and offering greater controllability. The key contribution lies in revealing that lightweight, prompt-guided text transformation—not end-to-end generation—suffices to significantly enhance the robustness and generalization of supervised classifiers. This insight establishes a reproducible, cost-effective augmentation strategy for low-resource sentiment analysis, bridging the gap between practical deployability and performance efficacy in resource-constrained scenarios.

Technology Category

Application Category

📝 Abstract

Numerous domain-specific machine learning tasks struggle with data scarcity and class imbalance. This paper systematically explores data augmentation methods for NLP, particularly through large language models like GPT. The purpose of this paper is to examine and evaluate whether traditional methods such as paraphrasing and backtranslation can leverage a new generation of models to achieve comparable performance to purely generative methods. Methods aimed at solving the problem of data scarcity and utilizing ChatGPT were chosen, as well as an exemplary dataset. We conducted a series of experiments comparing four different approaches to data augmentation in multiple experimental setups. We then evaluated the results both in terms of the quality of generated data and its impact on classification performance. The key findings indicate that backtranslation and paraphrasing can yield comparable or even better results than zero and a few-shot generation of examples.

Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in emotion classification tasks

Comparing traditional vs. generative data augmentation methods

Evaluating LLM-enhanced backtranslation and paraphrasing performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizing ChatGPT for data augmentation

Comparing backtranslation and paraphrasing methods

Evaluating impact on classification performance

🔎 Similar Papers

Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection