GPT's Devastated and LLaMA's Content: Emotion Representation Alignment in LLMs for Keyword-based Generation

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the controllability and human alignment of large language models (LLMs)—specifically GPT-4 and LLaMA-3—in keyword-driven sentence generation with respect to emotional semantics. To address this, we systematically compare four emotion representations—emotion words, VAD numerical scores, VAD lexicalized terms, and emojis—integrating prompt engineering, multidimensional emotion annotation (Valence-Arousal-Dominance), and human evaluation. Our key findings are: (1) emotion-word representations significantly outperform numerical VAD in both accuracy and naturalness, better aligning with human judgments; (2) we propose a novel VAD-to-lexical mapping method that substantially improves human–model consistency; and (3) representation efficacy is highly contingent on model architecture, emotion category, and representation format. Collectively, these results establish an interpretable, lightweight, and deployable representational optimization paradigm for controllable affective text generation.

Technology Category

Application Category

📝 Abstract
In controlled text generation using large language models (LLMs), gaps arise between the language model's interpretation and human expectations. We look at the problem of controlling emotions in keyword-based sentence generation for both GPT-4 and LLaMA-3. We selected four emotion representations: Words, Valence-Arousal-Dominance (VAD) dimensions expressed in both Lexical and Numeric forms, and Emojis. Our human evaluation looked at the Human-LLM alignment for each representation, as well as the accuracy and realism of the generated sentences. While representations like VAD break emotions into easy-to-compute components, our findings show that people agree more with how LLMs generate when conditioned on English words (e.g.,"angry") rather than VAD scales. This difference is especially visible when comparing Numeric VAD to words. However, we found that converting the originally-numeric VAD scales to Lexical scales (e.g., +4.0 becomes"High") dramatically improved agreement. Furthermore, the perception of how much a generated sentence conveys an emotion is highly dependent on the LLM, representation type, and which emotion it is.
Problem

Research questions and friction points this paper is trying to address.

Aligning emotion representation in LLMs with human expectations
Comparing emotion control in GPT-4 and LLaMA-3 for keyword-based generation
Evaluating accuracy and realism of generated sentences using different emotion representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion control using keyword-based generation
Comparison of GPT-4 and LLaMA-3 emotion representations
Improved alignment by converting numeric VAD to lexical scales
🔎 Similar Papers
No similar papers found.
S
Shadab Choudhury
Computer Science and Electrical Engineering Department, University of Maryland, Baltimore County
A
Asha Kumar
Information Systems Department, University of Maryland, Baltimore County
Lara J. Martin
Lara J. Martin
Assistant Professor, University of Maryland, Baltimore County
Narrative GenerationSpeech ProcessingArtificial IntelligenceComputational CreativityAAC