Superficial Beliefs in LLM Decision-Making

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether large language models possess a systematic internal decision structure in binary choices or merely mimic surface-level reasoning. Using synthetic tasks, the authors compare models’ self-reported key attributes against behaviorally inferred drivers through behavioral modeling, self-report analysis, attribute masking, and diverse prompt perturbations. The work introduces the concept of “shallow beliefs,” revealing that while models exhibit predictable, systematic behavior—captured effectively by behavioral models that forecast held-out choices—their explicit justifications only partially reflect the actual factors driving decisions. This misalignment between stated reasoning and behavioral drivers remains robust across multiple experimental conditions, highlighting a structural disconnect between current models’ reasoning processes and their actions.

📝 Abstract

We ask whether large language models (LLMs) merely imitate rationales when choosing between two options, or whether their choices reflect a systematic underlying decision structure. Using synthetic binary decision settings in which models choose between profiles defined by graded attributes, we compare the attribute a model says mattered most with the attribute that best explains its choice under a behavioural model fit to prior decisions. The behavioural model predicts held-out choices well, showing that model behaviour is systematically related to the visible attributes rather than being random. However, direct self-reports and a separate score-based judge recover the behaviourally inferred driver only partially. The resulting picture is neither one of arbitrary behaviour nor one of fully articulated belief - outputs are structured enough to support prediction, but explicit reasons track the recovered driver only imperfectly. This qualitative pattern persists across prompt-order and sampling perturbations, alternative behavioural models, targeted occlusion analyses, and structurally varied decision settings. We interpret this as evidence for ``superficial belief'' in LLM decision-making: models behave as if guided by probabilistic local priorities over attributes, while having only limited verbal access to the attributes that drive their decisions.

Problem

Research questions and friction points this paper is trying to address.

large language models

decision-making

superficial belief

behavioral modeling

attribute attribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

superficial belief

large language models

decision-making