🤖 AI Summary
To address low sentiment classification accuracy on Indonesian e-commerce reviews, this study systematically evaluates back-translation and synonym replacement—two data augmentation strategies—for enhancing text representations in low-resource language settings, using IndoBERT and DistilBERT as base models. Experimental results show that IndoBERT significantly outperforms DistilBERT (80.2% accuracy); back-translation yields the most consistent performance gain (+3.1 percentage points), confirming its critical utility in low-resource scenarios, whereas synonym replacement delivers marginal improvement and ensemble methods confer no substantial benefit. Crucially, this work provides the first quantitative analysis of the interplay between pre-trained model selection and data augmentation in Indonesian e-commerce sentiment analysis. It establishes a reproducible methodology for fine-grained sentiment classification in low-resource languages, offering empirically grounded guidance for model architecture and data enhancement decisions.
📝 Abstract
Understanding emotions in the Indonesian language is essential for improving customer experiences in e-commerce. This study focuses on enhancing the accuracy of emotion classification in Indonesian by leveraging advanced language models, IndoBERT and DistilBERT. A key component of our approach was data processing, specifically data augmentation, which included techniques such as back-translation and synonym replacement. These methods played a significant role in boosting the model's performance. After hyperparameter tuning, IndoBERT achieved an accuracy of 80%, demonstrating the impact of careful data processing. While combining multiple IndoBERT models led to a slight improvement, it did not significantly enhance performance. Our findings indicate that IndoBERT was the most effective model for emotion classification in Indonesian, with data augmentation proving to be a vital factor in achieving high accuracy. Future research should focus on exploring alternative architectures and strategies to improve generalization for Indonesian NLP tasks.