🤖 AI Summary
This study investigates the multidimensional behavioral alignment of large language models (LLMs) with human participants in adversarial conflict-resolution dialogues, focusing on linguistic style, dynamic anger expression, and strategic behavior. We propose the first multidimensional behavioral alignment evaluation framework specifically designed for conflict dialogue, innovatively incorporating Five-Factor Model (FFM) personality prompts to capture individual differences and establishing a comparable benchmark between LLM and human negotiation behaviors. Using models including GPT-4.1 and Claude-3.7-Sonnet, we integrate personality-guided prompting, multi-turn simulation, emotion trajectory modeling, and quantitative strategy analysis. Results indicate that GPT-4.1 achieves the closest alignment with humans in linguistic and affective dimensions, while Claude-3.7-Sonnet exhibits superior strategic behavior; however, all models exhibit substantial deficits in cross-dimensional consistency. This work provides both a theoretical foundation and an evaluation paradigm for the trustworthy deployment of LLMs in high-stakes, socially sensitive interpersonal interactions.
📝 Abstract
Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks, yet their ability to mirror human behavior in emotionally and strategically complex contexts remains underexplored. This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution by simulating multi-turn conflict dialogues that incorporate negotiation. Each LLM is guided by a matched Five-Factor personality profile to control for individual variation and enhance realism. We evaluate alignment across three dimensions: linguistic style, emotional expression (e.g., anger dynamics), and strategic behavior. GPT-4.1 achieves the closest alignment with humans in linguistic style and emotional dynamics, while Claude-3.7-Sonnet best reflects strategic behavior. Nonetheless, substantial alignment gaps persist. Our findings establish a benchmark for alignment between LLMs and humans in socially complex interactions, underscoring both the promise and the limitations of personality conditioning in dialogue modeling.