Leveraging IndoBERT and DistilBERT for Indonesian Emotion Classification in E-Commerce Reviews

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low sentiment classification accuracy on Indonesian e-commerce reviews, this study systematically evaluates back-translation and synonym replacement—two data augmentation strategies—for enhancing text representations in low-resource language settings, using IndoBERT and DistilBERT as base models. Experimental results show that IndoBERT significantly outperforms DistilBERT (80.2% accuracy); back-translation yields the most consistent performance gain (+3.1 percentage points), confirming its critical utility in low-resource scenarios, whereas synonym replacement delivers marginal improvement and ensemble methods confer no substantial benefit. Crucially, this work provides the first quantitative analysis of the interplay between pre-trained model selection and data augmentation in Indonesian e-commerce sentiment analysis. It establishes a reproducible methodology for fine-grained sentiment classification in low-resource languages, offering empirically grounded guidance for model architecture and data enhancement decisions.

Technology Category

Application Category

📝 Abstract
Understanding emotions in the Indonesian language is essential for improving customer experiences in e-commerce. This study focuses on enhancing the accuracy of emotion classification in Indonesian by leveraging advanced language models, IndoBERT and DistilBERT. A key component of our approach was data processing, specifically data augmentation, which included techniques such as back-translation and synonym replacement. These methods played a significant role in boosting the model's performance. After hyperparameter tuning, IndoBERT achieved an accuracy of 80%, demonstrating the impact of careful data processing. While combining multiple IndoBERT models led to a slight improvement, it did not significantly enhance performance. Our findings indicate that IndoBERT was the most effective model for emotion classification in Indonesian, with data augmentation proving to be a vital factor in achieving high accuracy. Future research should focus on exploring alternative architectures and strategies to improve generalization for Indonesian NLP tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Indonesian emotion classification accuracy
Leveraging IndoBERT and DistilBERT models
Applying data augmentation techniques for performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

IndoBERT and DistilBERT for emotion classification
Data augmentation with back-translation and synonym replacement
Hyperparameter tuning achieving 80% accuracy
🔎 Similar Papers
No similar papers found.
W
William Christian
Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
D
Daniel Adamlu
Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
A
Adrian Yu
Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
Derwin Suhartono
Derwin Suhartono
Computer Science Department, Bina Nusantara University
Artificial IntelligenceComputational LinguisticsPersonality Recognition