GPT's Devastated and LLaMA's Content: Emotion Representation Alignment in LLMs for Keyword-based Generation

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the controllability and human alignment of large language models (LLMs)—specifically GPT-4 and LLaMA-3—in keyword-driven sentence generation with respect to emotional semantics. To address this, we systematically compare four emotion representations—emotion words, VAD numerical scores, VAD lexicalized terms, and emojis—integrating prompt engineering, multidimensional emotion annotation (Valence-Arousal-Dominance), and human evaluation. Our key findings are: (1) emotion-word representations significantly outperform numerical VAD in both accuracy and naturalness, better aligning with human judgments; (2) we propose a novel VAD-to-lexical mapping method that substantially improves human–model consistency; and (3) representation efficacy is highly contingent on model architecture, emotion category, and representation format. Collectively, these results establish an interpretable, lightweight, and deployable representational optimization paradigm for controllable affective text generation.

Technology Category

Application Category

📝 Abstract

In controlled text generation using large language models (LLMs), gaps arise between the language model's interpretation and human expectations. We look at the problem of controlling emotions in keyword-based sentence generation for both GPT-4 and LLaMA-3. We selected four emotion representations: Words, Valence-Arousal-Dominance (VAD) dimensions expressed in both Lexical and Numeric forms, and Emojis. Our human evaluation looked at the Human-LLM alignment for each representation, as well as the accuracy and realism of the generated sentences. While representations like VAD break emotions into easy-to-compute components, our findings show that people agree more with how LLMs generate when conditioned on English words (e.g.,"angry") rather than VAD scales. This difference is especially visible when comparing Numeric VAD to words. However, we found that converting the originally-numeric VAD scales to Lexical scales (e.g., +4.0 becomes"High") dramatically improved agreement. Furthermore, the perception of how much a generated sentence conveys an emotion is highly dependent on the LLM, representation type, and which emotion it is.

Problem

Research questions and friction points this paper is trying to address.

Aligning emotion representation in LLMs with human expectations

Comparing emotion control in GPT-4 and LLaMA-3 for keyword-based generation

Evaluating accuracy and realism of generated sentences using different emotion representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion control using keyword-based generation

Comparison of GPT-4 and LLaMA-3 emotion representations

Improved alignment by converting numeric VAD to lexical scales

🔎 Similar Papers

No similar papers found.

Authors to Follow