🤖 AI Summary
This study addresses the authenticity and consistency of human personality simulation by large language models (LLMs) in virtual role-playing. We propose the first end-to-end evaluation framework specifically designed for LLM-driven virtual personas, integrating individual-level analyses (stability and identifiability) with population-level analyses (personality trajectory evolution), and innovatively adapt psychometric methods—confirmatory factor analysis (CFA) and construct validity—to low-intensity simulation settings. Empirical results demonstrate that the granularity of personality profiles is a critical determinant of simulation quality, exhibiting diminishing marginal returns consistent with scaling-law behavior. The study not only validates the significant positive impact of detailed personality specifications on both authenticity and consistency but also provides reproducible quantitative metrics and theoretical grounding. These contributions establish a methodological foundation for LLM-enabled social experiments and advance rigorous, measurement-informed evaluation of artificial social agents.
📝 Abstract
This research focuses on using large language models (LLMs) to simulate social experiments, exploring their ability to emulate human personality in virtual persona role-playing. The research develops an end-to-end evaluation framework, including individual-level analysis of stability and identifiability, as well as population-level analysis called progressive personality curves to examine the veracity and consistency of LLMs in simulating human personality. Methodologically, this research proposes important modifications to traditional psychometric approaches (CFA and construct validity) which are unable to capture improvement trends in LLMs at their current low-level simulation, potentially leading to remature rejection or methodological misalignment. The main contributions of this research are: proposing a systematic framework for LLM virtual personality evaluation; empirically demonstrating the critical role of persona detail in personality simulation quality; and identifying marginal utility effects of persona profiles, especially a Scaling Law in LLM personality simulation, offering operational evaluation metrics and a theoretical foundation for applying large language models in social science experiments.