๐ค AI Summary
Prior research has not investigated the application of large language models (LLMs) to sentiment analysis on South African indigenous-language social media (e.g., Sepedi, Setswana), hindering real-time identification of sociopolitical challenges and evidence-informed policy responses in multilingual contexts.
Method: We conduct the first systematic evaluation of GPT-3.5, GPT-4, Llama-2, PaLM-2, and Dolly-2 for zero-shot cross-lingual sentiment classification, and propose a confidence-weighted multi-model ensemble strategy.
Contribution/Results: Evaluated on authentic multilingual South African social media data, our approach reduces sentiment classification error to under 1%, significantly outperforming single-model baselines across language and topic domains. It establishes a reproducible technical framework for low-resource language sentiment analysis and delivers high-fidelity, multilingual sentiment signals to support government monitoring of public opinion and formulation of inclusive social policies.
๐ Abstract
Sentiment analysis can aid in understanding people's opinions and emotions on social issues. In multilingual communities sentiment analysis systems can be used to quickly identify social challenges in social media posts, enabling government departments to detect and address these issues more precisely and effectively. Recently, large-language models (LLMs) have become available to the wide public and initial analyses have shown that they exhibit magnificent zero-shot sentiment analysis abilities in English. However, there is no work that has investigated to leverage LLMs for sentiment analysis on social media posts in South African languages and detect social challenges. Consequently, in this work, we analyse the zero-shot performance of the state-of-the-art LLMs GPT-3.5, GPT-4, LlaMa 2, PaLM 2, and Dolly 2 to investigate the sentiment polarities of the 10 most emerging topics in English, Sepedi and Setswana social media posts that fall within the jurisdictional areas of 10 South African government departments. Our results demonstrate that there are big differences between the various LLMs, topics, and languages. In addition, we show that a fusion of the outcomes of different LLMs provides large gains in sentiment classification performance with sentiment classification errors below 1%. Consequently, it is now feasible to provide systems that generate reliable information about sentiment analysis to detect social challenges and draw conclusions about possible needs for actions on specific topics and within different language groups.