Incoherent Probability Judgments in Large Language Models

📅 2024-01-30
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates whether autoregressive large language models (LLMs) adhere to the axioms of probability theory in their probabilistic judgments, and whether their deviations exhibit human-like systematicity. Method: Leveraging probability identity tests and repeated-judgment paradigms, the study integrates statistical modeling with implicit Bayesian cognitive modeling to systematically assess logical consistency in LLMs’ probabilistic reasoning. Contribution/Results: The work provides the first systematic empirical evidence that LLMs consistently violate probability axioms—exhibiting pervasive logical inconsistency. Their bias patterns closely mirror those observed in human judgment, including a characteristic inverted-U relationship between mean and variance in probability estimates. These findings formally link LLMs’ irrational probabilistic inference to the “Bayesian sampler” cognitive theory, offering novel empirical support and a unified theoretical framework for understanding fundamental limitations in their reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Autoregressive Large Language Models (LLMs) trained for next-word prediction have demonstrated remarkable proficiency at producing coherent text. But are they equally adept at forming coherent probability judgments? We use probabilistic identities and repeated judgments to assess the coherence of probability judgments made by LLMs. Our results show that the judgments produced by these models are often incoherent, displaying human-like systematic deviations from the rules of probability theory. Moreover, when prompted to judge the same event, the mean-variance relationship of probability judgments produced by LLMs shows an inverted-U-shaped like that seen in humans. We propose that these deviations from rationality can be explained by linking autoregressive LLMs to implicit Bayesian inference and drawing parallels with the Bayesian Sampler model of human probability judgments.
Problem

Research questions and friction points this paper is trying to address.

Assessing coherence of probability judgments in LLMs
Identifying human-like deviations in LLM probability outputs
Linking LLM deviations to Bayesian inference models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Assess LLM coherence via probabilistic identities
Link LLMs to implicit Bayesian inference
Compare LLM judgments to human Bayesian Sampler
🔎 Similar Papers
No similar papers found.