🤖 AI Summary
This study investigates whether large language models (LLMs) can effectively internalize and consistently manifest user-specified psychological traits and attitudes, introducing MindShift—the first benchmark for evaluating psychological plasticity in LLMs. Methodologically, we adapt the MMPI to design multi-intensity personality role prompts, and systematically quantify models’ role awareness and psychological adaptability via prompt sensitivity analysis and cross-model response comparison. Key contributions include: (1) the first psychometrically grounded evaluation framework for LLM-based personality simulation; (2) empirical evidence of significant performance disparities across model families—particularly between closed- vs. open-source and reasoning- vs. non-reasoning models—in personality consistency; (3) demonstration that psychological adaptability improves progressively with training data scale and alignment technique advancement; and (4) open-sourcing of a standardized psychological prompt suite and evaluation codebase.
📝 Abstract
Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users. In our study, we investigated this potential using robust psychometric measures. We adapted the most studied test in psychological literature, namely Minnesota Multiphasic Personality Inventory (MMPI) and examined LLMs' behavior to identify traits. To asses the sensitivity of LLMs' prompts and psychological biases we created personality-oriented prompts, crafting a detailed set of personas that vary in trait intensity. This enables us to measure how well LLMs follow these roles. Our study introduces MindShift, a benchmark for evaluating LLMs' psychological adaptability. The results highlight a consistent improvement in LLMs' role perception, attributed to advancements in training datasets and alignment techniques. Additionally, we observe significant differences in responses to psychometric assessments across different model types and families, suggesting variability in their ability to emulate human-like personality traits. MindShift prompts and code for LLM evaluation will be publicly available.