Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the multidimensional behavioral alignment of large language models (LLMs) with human participants in adversarial conflict-resolution dialogues, focusing on linguistic style, dynamic anger expression, and strategic behavior. We propose the first multidimensional behavioral alignment evaluation framework specifically designed for conflict dialogue, innovatively incorporating Five-Factor Model (FFM) personality prompts to capture individual differences and establishing a comparable benchmark between LLM and human negotiation behaviors. Using models including GPT-4.1 and Claude-3.7-Sonnet, we integrate personality-guided prompting, multi-turn simulation, emotion trajectory modeling, and quantitative strategy analysis. Results indicate that GPT-4.1 achieves the closest alignment with humans in linguistic and affective dimensions, while Claude-3.7-Sonnet exhibits superior strategic behavior; however, all models exhibit substantial deficits in cross-dimensional consistency. This work provides both a theoretical foundation and an evaluation paradigm for the trustworthy deployment of LLMs in high-stakes, socially sensitive interpersonal interactions.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks, yet their ability to mirror human behavior in emotionally and strategically complex contexts remains underexplored. This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution by simulating multi-turn conflict dialogues that incorporate negotiation. Each LLM is guided by a matched Five-Factor personality profile to control for individual variation and enhance realism. We evaluate alignment across three dimensions: linguistic style, emotional expression (e.g., anger dynamics), and strategic behavior. GPT-4.1 achieves the closest alignment with humans in linguistic style and emotional dynamics, while Claude-3.7-Sonnet best reflects strategic behavior. Nonetheless, substantial alignment gaps persist. Our findings establish a benchmark for alignment between LLMs and humans in socially complex interactions, underscoring both the promise and the limitations of personality conditioning in dialogue modeling.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLM behavioral alignment with humans in adversarial dispute resolution dialogues
Evaluating alignment across linguistic style, emotional expression, and strategic behavior dimensions
Establishing benchmarks for personality-conditioned LLMs in socially complex interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Personality-prompted LLMs simulate conflict dialogues
Multi-dimensional evaluation across linguistic emotional strategic
GPT-4 aligns linguistically Claude-3 strategically
🔎 Similar Papers
No similar papers found.
D
Deuksin Kwon
University of Southern California, USC for Institute of Creative Technologies
K
Kaleen Shrestha
University of Southern California, USC for Institute of Creative Technologies
B
Bin Han
University of Southern California, USC for Institute of Creative Technologies
E
Elena Hayoung Lee
University of Southern California
Gale M. Lucas
Gale M. Lucas
USC Institute for Creative Technologies