Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often yield suboptimal outcomes in multi-agent collaboration due to insufficient cooperative behavior. Method: This work pioneers the integration of personality psychology into LLM alignment, embedding the Big Five personality traits—particularly agreeableness and conscientiousness—into an iterated prisoner’s dilemma framework via representation engineering and causal intervention. Contribution/Results: Empirical evaluation demonstrates that elevated agreeableness and conscientiousness significantly increase cooperation rates (+37%) but simultaneously heighten vulnerability to betrayal (2.1× higher probability), revealing a fundamental “double-edged sword” effect. Based on this finding, we propose a novel trade-off paradigm between personality steerability and robustness. Furthermore, we introduce the first interpretable benchmark for mapping LLM personality traits to observable behavioral outcomes, enabling systematic, quantifiable analysis of personality-driven decision-making in multi-agent settings.

Technology Category

Application Category

📝 Abstract
As Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important. However, they often struggle with cooperation, leading to suboptimal outcomes. Inspired by Axelrod's Iterated Prisoner's Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation. Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making. Our results show that higher Agreeableness and Conscientiousness improve cooperation but increase susceptibility to exploitation, highlighting both the potential and limitations of personality-based steering for aligning AI agents.
Problem

Research questions and friction points this paper is trying to address.

Enhancing cooperation in multi-agent LLM systems
Analyzing impact of personality traits on LLM decision-making
Balancing cooperation and exploitation in AI alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Steering Big Five traits in LLMs
Analyzing personality impact on cooperation
Using representation engineering for AI alignment
🔎 Similar Papers
No similar papers found.
K
Kenneth J. K. Ong
AI.DA STC, ST Engineering
L
Lye Jia Jun
Singapore Management University
H
Hieu MinhJordNguyen
Apart Research
Seong Hah Cho
Seong Hah Cho
University of Hong Kong
N
Natalia P'erez-Campanero Antol'in
Apart Research