Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the multidimensional behavioral alignment of large language models (LLMs) with human participants in adversarial conflict-resolution dialogues, focusing on linguistic style, dynamic anger expression, and strategic behavior. We propose the first multidimensional behavioral alignment evaluation framework specifically designed for conflict dialogue, innovatively incorporating Five-Factor Model (FFM) personality prompts to capture individual differences and establishing a comparable benchmark between LLM and human negotiation behaviors. Using models including GPT-4.1 and Claude-3.7-Sonnet, we integrate personality-guided prompting, multi-turn simulation, emotion trajectory modeling, and quantitative strategy analysis. Results indicate that GPT-4.1 achieves the closest alignment with humans in linguistic and affective dimensions, while Claude-3.7-Sonnet exhibits superior strategic behavior; however, all models exhibit substantial deficits in cross-dimensional consistency. This work provides both a theoretical foundation and an evaluation paradigm for the trustworthy deployment of LLMs in high-stakes, socially sensitive interpersonal interactions.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks, yet their ability to mirror human behavior in emotionally and strategically complex contexts remains underexplored. This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution by simulating multi-turn conflict dialogues that incorporate negotiation. Each LLM is guided by a matched Five-Factor personality profile to control for individual variation and enhance realism. We evaluate alignment across three dimensions: linguistic style, emotional expression (e.g., anger dynamics), and strategic behavior. GPT-4.1 achieves the closest alignment with humans in linguistic style and emotional dynamics, while Claude-3.7-Sonnet best reflects strategic behavior. Nonetheless, substantial alignment gaps persist. Our findings establish a benchmark for alignment between LLMs and humans in socially complex interactions, underscoring both the promise and the limitations of personality conditioning in dialogue modeling.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM behavioral alignment with humans in adversarial dispute resolution dialogues

Evaluating alignment across linguistic style, emotional expression, and strategic behavior dimensions

Establishing benchmarks for personality-conditioned LLMs in socially complex interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Personality-prompted LLMs simulate conflict dialogues

Multi-dimensional evaluation across linguistic emotional strategic

GPT-4 aligns linguistically Claude-3 strategically

🔎 Similar Papers

No similar papers found.

Authors to Follow