Should LLM Agents Decide in Social Simulations? Comparing Finite-State and LLM-Based Decision Policies

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of aligning large language models (LLMs) with predefined behavioral strategies in social simulations, where LLMs acting as decision-making components may deviate from intended policies, thereby compromising system dynamics and interpretability. The authors present the first systematic quantification of alignment between LLMs and explicit first-order Markov policies, evaluating three open-source models—LLaMA 3.1, GPT-OSS, and Mistral 24B—within a synthetic social network of 1,000 agents under base, guided, and probabilistic prompting conditions. Strategy consistency is measured using Jensen–Shannon divergence with Laplace smoothing. Results indicate that while certain LLM configurations can approximate the reference policy, alignment remains unstable; even the best-performing setup is hundreds of times slower than direct Markov sampling, and additional prompting may introduce systematic biases, suggesting that LLMs are not yet reliable substitutes for explicit policy implementations.

📝 Abstract

Large language models (LLMs) are increasingly used as decision-making components in social simulations. This introduces a methodological risk: the simulation may deviate from the explicit behavioral policy defined by the researcher. In online social network (OSN) simulations, action choices shape system dynamics, interaction patterns, and model interpretability. This paper evaluates whether LLM action selectors preserve an interpretable reference policy in an OSN simulation. The reference is a finite state machine implemented as a first-order Markov model, with transition probabilities depending on the user type. The evaluation uses a synthetic network with 1,000 agents and 10,000 action decisions. Three open-weight LLMs are tested: LLaMA 3.1, GPT-OSS, and Mistral 24B. Each model is evaluated under three prompting strategies: base, guided, and probabilistic. Alignment is measured using Jensen-Shannon Divergence with Laplace smoothing, and execution time is reported. Results show that LLMs can approximate the reference policy in some configurations, but do not preserve it reliably. Alignment varies across models and prompts, and additional guidance can introduce systematic action biases. Even the best-aligned LLM configurations are several hundred times slower than direct Markov chain sampling. These findings indicate that LLM-based action selection is not a direct replacement for explicit decision policies: it can alter the intended behavior while increasing computational cost.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

social simulations

decision policies

behavioral alignment

interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agents

social simulation

decision policy