Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) often yield suboptimal outcomes in multi-agent collaboration due to insufficient cooperative behavior. Method: This work pioneers the integration of personality psychology into LLM alignment, embedding the Big Five personality traits—particularly agreeableness and conscientiousness—into an iterated prisoner’s dilemma framework via representation engineering and causal intervention. Contribution/Results: Empirical evaluation demonstrates that elevated agreeableness and conscientiousness significantly increase cooperation rates (+37%) but simultaneously heighten vulnerability to betrayal (2.1× higher probability), revealing a fundamental “double-edged sword” effect. Based on this finding, we propose a novel trade-off paradigm between personality steerability and robustness. Furthermore, we introduce the first interpretable benchmark for mapping LLM personality traits to observable behavioral outcomes, enabling systematic, quantifiable analysis of personality-driven decision-making in multi-agent settings.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important. However, they often struggle with cooperation, leading to suboptimal outcomes. Inspired by Axelrod's Iterated Prisoner's Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation. Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making. Our results show that higher Agreeableness and Conscientiousness improve cooperation but increase susceptibility to exploitation, highlighting both the potential and limitations of personality-based steering for aligning AI agents.

Problem

Research questions and friction points this paper is trying to address.

Enhancing cooperation in multi-agent LLM systems

Analyzing impact of personality traits on LLM decision-making

Balancing cooperation and exploitation in AI alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Steering Big Five traits in LLMs

Analyzing personality impact on cooperation

Using representation engineering for AI alignment

🔎 Similar Papers

No similar papers found.

Authors to Follow