Conversations: Love Them, Hate Them, Steer Them

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current large language models (LLMs) exhibit rigid, non-humanlike emotional expression in dialogue, lacking fine-grained affective nuance and consistent personality. To address this, we propose a fine-tuning-free, interpretable, and fine-grained emotion-controllable dialogue regulation method. Our core innovation is a novel emotion vector construction mechanism that integrates attribution patching with contrastive activation difference, enabling targeted intervention within the hidden state space of LLaMA-3.1-8B. This approach supports precise, independent control over intensities of positive emotions (e.g., joy, trust) and first-person pronoun usage frequency, thereby significantly enhancing response empathy and persona consistency. Experimental results demonstrate that the intervention is both interpretable—via causal attribution—and reproducible across prompts and contexts. Our method establishes a new paradigm for imbuing LLMs with controllable, human-aligned affective intelligence without architectural modification or parameter updates.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) demonstrate increasing conversational fluency, yet instilling them with nuanced, human-like emotional expression remains a significant challenge. Current alignment techniques often address surface-level output or require extensive fine-tuning. This paper demonstrates that targeted activation engineering can steer LLaMA 3.1-8B to exhibit more human-like emotional nuances. We first employ attribution patching to identify causally influential components, to find a key intervention locus by observing activation patterns during diagnostic conversational tasks. We then derive emotional expression vectors from the difference in the activations generated by contrastive text pairs (positive vs. negative examples of target emotions). Applying these vectors to new conversational prompts significantly enhances emotional characteristics: steered responses show increased positive sentiment (e.g., joy, trust) and more frequent first-person pronoun usage, indicative of greater personal engagement. Our findings offer a precise and interpretable method for controlling specific emotional attributes in LLMs, contributing to developing more aligned and empathetic conversational AI.

Problem

Research questions and friction points this paper is trying to address.

Enhancing human-like emotional expression in LLMs

Identifying key intervention points via activation patterns

Controlling emotional attributes with interpretable methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Targeted activation engineering steers emotional expression

Attribution patching identifies key intervention components

Contrastive text pairs derive emotional expression vectors

🔎 Similar Papers

No similar papers found.

Authors to Follow