🤖 AI Summary
This study investigates how cognitive load modulates the influence of cognitive biases—such as the framing effect and status quo bias—on human decision-making in conversational contexts. Through a large-scale preregistered experiment, the authors systematically manipulated dialogue complexity to induce varying levels of cognitive load and employed large language models (including GPT-4, GPT-5, and open-source alternatives), augmented with user demographic data and dialogue history, to predict individual choices in canonical decision tasks. Results demonstrate that GPT-4-based models not only accurately replicate human bias patterns but also capture the interaction between cognitive load and bias intensity, showing heightened bias under high load—a performance significantly surpassing other models. This work provides the first systematic validation of large language models’ capacity to simulate human irrational decision-making, offering a theoretical and methodological foundation for developing intelligent dialogue systems that adapt to users’ cognitive biases.
📝 Abstract
We examine whether large language models (LLMs) can predict biased decision-making in conversational settings, and whether their predictions capture not only human cognitive biases but also how those effects change under cognitive load. In a pre-registered study (N = 1,648), participants completed six classic decision-making tasks via a chatbot with dialogues of varying complexity. Participants exhibited two well-documented cognitive biases: the Framing Effect and the Status Quo Bias. Increased dialogue complexity resulted in participants reporting higher mental demand. This increase in cognitive load selectively, but significantly, increased the effect of the biases, demonstrating the load-bias interaction. We then evaluated whether LLMs (GPT-4, GPT-5, and open-source models) could predict individual decisions given demographic information and prior dialogue. While results were mixed across choice problems, LLM predictions that incorporated dialogue context were significantly more accurate in several key scenarios. Importantly, their predictions reproduced the same bias patterns and load-bias interactions observed in humans. Across all models tested, the GPT-4 family consistently aligned with human behavior, outperforming GPT-5 and open-source models in both predictive accuracy and fidelity to human-like bias patterns. These findings advance our understanding of LLMs as tools for simulating human decision-making and inform the design of conversational agents that adapt to user biases.