🤖 AI Summary
This study addresses the long-overlooked role of repeated lengthening forms (RLFs) in sentiment analysis, whose impact on model comprehension and expression remains unclear. The work presents the first systematic investigation of RLFs in this context, introducing Lengthening—the first multi-domain dataset specifically curated for RLFs—and proposes a unified framework to quantitatively assess language models’ capacity to interpret informal expressions. Furthermore, it develops ExpInstruct, a two-stage interpretable instruction-tuning approach that, under few-shot settings, enables open-source large language models to match the zero-shot performance of GPT-4 while substantially enhancing their ability to explain RLFs. Experimental results demonstrate that RLFs serve as salient indicators of document-level sentiment.
📝 Abstract
Individuals engaging in online communication frequently express personal opinions with informal styles (e.g., memes and emojis). While Language Models (LMs) with informal communications have been widely discussed, a unique and emphatic style, the Repetitive Lengthening Form (RLF), has been overlooked for years. In this paper, we explore answers to two research questions: 1) Is RLF important for sentiment analysis (SA)? 2) Can LMs understand RLF? Inspired by previous linguistic research, we curate \textbf{Lengthening}, the first multi-domain dataset with 850k samples focused on RLF for SA. Moreover, we introduce \textbf{Exp}lainable \textbf{Instruct}ion Tuning (\textbf{ExpInstruct}), a two-stage instruction tuning framework aimed to improve both performance and explainability of LLMs for RLF. We further propose a novel unified approach to quantify LMs' understanding of informal expressions. We show that RLF sentences are expressive expressions and can serve as signatures of document-level sentiment. Additionally, RLF has potential value for online content analysis. Our results show that fine-tuned Pre-trained Language Models (PLMs) can surpass zero-shot GPT-4 in performance but not in explanation for RLF. Finally, we show ExpInstruct can improve the open-sourced LLMs to match zero-shot GPT-4 in performance and explainability for RLF with limited samples. Code and sample data are available at https://github.com/Tom-Owl/OverlookedRLF