🤖 AI Summary
This work addresses the limited robustness of large language models (LLMs) in sentiment analysis—particularly toward pragmatic phenomena such as irony, emojis, and fragmented language—and their poor generalization to domain-specific corpora (e.g., nuclear energy). We propose a synergistic optimization framework integrating textual paraphrasing, irony detection and removal, adversarial augmentation, and domain-adaptive fine-tuning. We construct a high-quality, manually annotated dataset of ironic tweets. We introduce the novel joint strategy of “irony removal + topic-agnostic pretraining” and empirically validate the critical contribution of general-domain corpora to irony comprehension. Experiments show that irony removal improves sentiment accuracy by 21 percentage points (to 51%); fine-tuning on general-domain data achieves 60% irony detection accuracy; adversarial augmentation yields 85% robustness against perturbations; and paraphrasing upgrades 40% of low-confidence predictions, boosting overall sentiment accuracy by 6%.
📝 Abstract
Large Language Models (LLMs) have demonstrated impressive performance across various tasks, including sentiment analysis. However, data quality--particularly when sourced from social media--can significantly impact their accuracy. This research explores how textual nuances, including emojis and sarcasm, affect sentiment analysis, with a particular focus on improving data quality through text paraphrasing techniques. To address the lack of labeled sarcasm data, the authors created a human-labeled dataset of 5929 tweets that enabled the assessment of LLM in various sarcasm contexts. The results show that when topic-specific datasets, such as those related to nuclear power, are used to finetune LLMs these models are not able to comprehend accurate sentiment in presence of sarcasm due to less diverse text, requiring external interventions like sarcasm removal to boost model accuracy. Sarcasm removal led to up to 21% improvement in sentiment accuracy, as LLMs trained on nuclear power-related content struggled with sarcastic tweets, achieving only 30% accuracy. In contrast, LLMs trained on general tweet datasets, covering a broader range of topics, showed considerable improvements in predicting sentiment for sarcastic tweets (60% accuracy), indicating that incorporating general text data can enhance sarcasm detection. The study also utilized adversarial text augmentation, showing that creating synthetic text variants by making minor changes significantly increased model robustness and accuracy for sarcastic tweets (approximately 85%). Additionally, text paraphrasing of tweets with fragmented language transformed around 40% of the tweets with low-confidence labels into high-confidence ones, improving LLMs sentiment analysis accuracy by 6%.