π€ AI Summary
In computational social science, LLM-based role prompting typically relies on manually crafted personas, lacking reproducible, nationally representative persona resources grounded in survey dataβlimiting simulation fidelity and demographic alignment. To address this, we introduce GGP, the first German-language persona prompt library derived from the nationally representative ALLBUS survey in Germany. GGP employs statistically driven attribute selection and structured prompt engineering to map demographic variables onto LLM-interpretable persona descriptions. Compatible with diverse large language models, GGP significantly outperforms conventional classifiers in low-data regimes. Empirical evaluation demonstrates that GGP-guided models more accurately reproduce observed survey response distributions, thereby enhancing demographic representativeness and behavioral plausibility in social simulations. (132 words)
π Abstract
The use of Large Language Models (LLMs) for simulating human perspectives via persona prompting is gaining traction in computational social science. However, well-curated, empirically grounded persona collections remain scarce, limiting the accuracy and representativeness of such simulations. Here we introduce the German General Personas (GGP) collection, a comprehensive and representative persona prompt collection built from the German General Social Survey (ALLBUS). The GGP and its persona prompts are designed to be easily plugged into prompts for all types of LLMs and tasks, steering models to generate responses aligned with the underlying German population. We evaluate GGP by prompting various LLMs to simulate survey response distributions across diverse topics, demonstrating that GGP-guided LLMs outperform state-of-the-art classifiers, particularly under data scarcity. Furthermore, we analyze how the representativity and attribute selection within persona prompts affect alignment with population responses. Our findings suggest that GGP provides a potentially valuable resource for research on LLM-based social simulations that enables more systematic explorations of population-aligned persona prompting in NLP and social science research.