🤖 AI Summary
Current LLM-based user persona generation suffers from superficial descriptions, format constraints (overreliance on textual/numerical outputs), inconsistent prompt design, and a lack of cross-model evaluation. Method: This study systematically analyzes 83 persona-generation prompts drawn from 27 prior works—the first cross-study prompt engineering synthesis—employing qualitative coding and quantitative statistics to examine output formats, attribute types, and prompting patterns. Contribution/Results: We find that over 50% of prompts enforce structured outputs (e.g., JSON), 74% utilize dynamic variable injection, yet >90% of studies employ only a few prompts and omit comparative LLM evaluations. The analysis reveals critical limitations in persona richness, multidimensionality, and methodological rigor. To address these, we propose three optimization pathways: (1) structured guidance for consistent output semantics, (2) dynamic contextual modeling for adaptive persona instantiation, and (3) multi-model collaborative evaluation for robust assessment. This work offers both methodological reflection and practical benchmarks for computational user representation.
📝 Abstract
We analyzed 83 persona prompts from 27 research articles that used large language models (LLMs) to generate user personas. Findings show that the prompts predominantly generate single personas. Several prompts express a desire for short or concise persona descriptions, which deviates from the tradition of creating rich, informative, and rounded persona profiles. Text is the most common format for generated persona attributes, followed by numbers. Text and numbers are often generated together, and demographic attributes are included in nearly all generated personas. Researchers use up to 12 prompts in a single study, though most research uses a small number of prompts. Comparison and testing multiple LLMs is rare. More than half of the prompts require the persona output in a structured format, such as JSON, and 74% of the prompts insert data or dynamic variables. We discuss the implications of increased use of computational personas for user representation.