🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit the human-specific false consensus effect (FCE)—a cognitive bias wherein individuals overestimate the extent to which others share their beliefs or judgments.
Method: Two controlled behavioral experiments, designed to mitigate confounding biases, systematically evaluate FCE across mainstream LLMs (GPT-4, Claude, Llama) in general-domain settings. Standardized questionnaires and multi-prompt paradigms—including zero-shot, few-shot, and chain-of-thought prompting—are employed.
Contribution/Results: We provide the first empirical evidence that all tested LLMs exhibit statistically significant FCE. Crucially, prompt engineering—specifically instruction clarity, role assignment, and reasoning step granularity—robustly modulates FCE magnitude: certain configurations amplify it by up to 42%, while others nearly eliminate it. This work pioneers the integration of social cognition theory into LLM evaluation, establishing a novel theoretical framework and actionable intervention pathways for understanding and controlling anthropomorphic biases in foundation models.
📝 Abstract
Large Language Models (LLMs) have been recently adopted in interactive systems requiring communication. As the false belief in a model can harm the usability of such systems, LLMs should not have cognitive biases that humans have. Psychologists especially focus on the False Consensus Effect (FCE), a cognitive bias where individuals overestimate the extent to which others share their beliefs or behaviors, because FCE can distract smooth communication by posing false beliefs. However, previous studies have less examined FCE in LLMs thoroughly, which needs more consideration of confounding biases, general situations, and prompt changes. Therefore, in this paper, we conduct two studies to examine the FCE phenomenon in LLMs. In Study 1, we investigate whether LLMs have FCE. In Study 2, we explore how various prompting styles affect the demonstration of FCE. As a result of these studies, we identified that popular LLMs have FCE. Also, the result specifies the conditions when FCE becomes more or less prevalent compared to normal usage.