Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) exhibit human-like caution in risky decision-making through the lens of the St. Petersburg paradox as a controlled testbed. By systematically evaluating 28 LLMs—spanning both base and instruction-tuned variants—across the original game and multiple structured variants (including truncation, repeated play, numerical endowments, and occupational framing), the work reveals that while models produce finite, seemingly human-consistent bids in the canonical task, they shift toward conditional rationality in modified settings, diverging fundamentally from human decision mechanisms. Instruction tuning mitigates only superficial behavioral biases without altering the underlying decision logic. These findings underscore the necessity of probing beyond surface-level outputs to scrutinize the internal reasoning processes of LLMs, particularly in high-stakes decision contexts.

📝 Abstract

LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms. We investigate this distinction using the St. Petersburg game as a controlled testbed, a classical paradox in which the expected payoff is infinite, yet humans typically report low, finite willingness to pay. We evaluate 28 LLMs with a structured prompt suite that includes the original game; controlled decision variants that perturb truncation, repeated play, numeric endowment, and occupational identity; a human-perspective prompt that asks models to reason as human decision makers; and paired comparisons between base models and their instruction-tuned counterparts. In the original game, most models generate finite bids, creating the appearance of human-like risk behavior. However, this outcome-level resemblance masks substantial mechanism-level differences. The controlled variants reveal that rather than maintaining human-like behavior seen in the original game, models often shift to conditionally and computationally rational behavior. Human-cue prompting and instruction tuning often lower bids and reduce some visible pathologies, but most mechanism-level response patterns remain largely unchanged. These findings show that behavioral alignment in risk decision-making can be surface-level: LLMs may produce human-like risk decisions without exhibiting human-consistent mechanisms. High-stakes evaluations of LLM decision-making should therefore move beyond outcome similarity and examine whether the alignment is supported by mechanism-level consistency.

Problem

Research questions and friction points this paper is trying to address.

risk decision-making

mechanism alignment

outcome resemblance

St. Petersburg game

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

mechanism-level alignment

outcome-level resemblance

St. Petersburg game