🤖 AI Summary
This study investigates the impact of emotional expressions in user prompts on the performance of large language models across diverse tasks. Focusing on six task categories—including mathematical reasoning, medical question answering, and reading comprehension—the work systematically evaluates the effect of first-person emotional prefixes and finds that performance gains are highly task-dependent, with the most pronounced improvements observed in social reasoning tasks. To address this variability, the paper introduces EmotionRL, the first reinforcement learning–based adaptive emotional prompting framework that dynamically selects optimal emotional expressions. Experimental results demonstrate that while static emotional prompts yield only marginal benefits, EmotionRL achieves more consistent and reliable performance gains across multiple tasks.
📝 Abstract
Emotional tone is pervasive in human communication, yet its influence on large language model (LLM) behaviour remains unclear. Here, we examine how first-person emotional framing in user-side queries affect LLM performance across six benchmark domains, including mathematical reasoning, medical question answering, reading comprehension, commonsense reasoning and social inference. Across models and tasks, static emotional prefixes usually produce only small changes in accuracy, suggesting that affective phrasing is typically a mild perturbation rather than a reliable general-purpose intervention. This stability is not uniform: effects are more variable in socially grounded tasks, where emotional context more plausibly interacts with interpersonal reasoning. Additional analyses show that stronger emotional wording induces only modest extra change, and that human-written prefixes reproduce the same qualitative pattern as LLM-generated ones. We then introduce EmotionRL, an adaptive emotional prompting framework that selects emotional framing adaptively for each query. Although no single emotion is consistently beneficial, adaptive selection yields more reliable gains than fixed emotional prompting. Together, these findings show that emotional tone is neither a dominant driver of LLM performance nor irrelevant noise, but a weak and input-dependent signal that can be exploited through adaptive control.