🤖 AI Summary
This paper investigates whether large language models (LLMs) can approximate human higher-order strategic reasoning through recursive inference. Method: We propose an LLM-augmented multi-agent hypergame framework, wherein role-specialized agents are coordinated by a formal adjudication mechanism and integrate hierarchical belief modeling with recursive reasoning to simulate human strategy selection in canonical games such as the “beauty contest.” Crucially, we introduce a semantic recursive reasoning metric—replacing traditional k-level theory—to enable more accurate behavioral fitting. Results: Experiments demonstrate that our LLM-based artificial reasoner significantly outperforms classical economic benchmarks—including Cognitive Hierarchy and Level-k—in both behavioral fit (e.g., log-likelihood) and convergence to Nash equilibrium. This work establishes the first LLM-driven hypergame simulation paradigm for higher-order belief modeling, empirically validating the efficacy and superiority of LLMs in strategic recursive reasoning.
📝 Abstract
LLM-driven multi-agent-based simulations have been gaining traction with applications in game-theoretic and social simulations. While most implementations seek to exploit or evaluate LLM-agentic reasoning, they often do so with a weak notion of agency and simplified architectures. We implement a role-based multi-agent strategic interaction framework tailored to sophisticated recursive reasoners, providing the means for systematic in-depth development and evaluation of strategic reasoning. Our game environment is governed by the umpire responsible for facilitating games, from matchmaking through move validation to environment management. Players incorporate state-of-the-art LLMs in their decision mechanism, relying on a formal hypergame-based model of hierarchical beliefs. We use one-shot, 2-player beauty contests to evaluate the recursive reasoning capabilities of the latest LLMs, providing a comparison to an established baseline model from economics and data from human experiments. Furthermore, we introduce the foundations of an alternative semantic measure of reasoning to the k-level theory. Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.