A Comprehensive Evaluation of Cognitive Biases in LLMs

📅 2024-10-20
🏛️ Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
📈 Citations: 13
Influential: 2
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit human-like cognitive biases—systematic deviations from rational judgment—across 30 canonical bias categories. Method: We propose the first general-purpose, automated cognitive bias evaluation framework, integrating prompt-driven controllable scenario generation, structured test-case design, multi-model batch response collection, and systematic bias annotation and analysis. We construct an open-source benchmark dataset comprising 30,000 annotated samples. Contribution/Results: Empirical evaluation of 20 state-of-the-art LLMs reveals that all models exhibit at least one bias category, and all 30 biases are statistically substantiated. Significant inter-model variation in bias prevalence and severity is observed. This work delivers the first large-scale empirical evidence base and open evaluation infrastructure for LLM alignment, bias mitigation, and trustworthy AI development.

Technology Category

Application Category

📝 Abstract
We present a large-scale evaluation of 30 cognitive biases in 20 state-of-the-art large language models (LLMs) under various decision-making scenarios. Our contributions include a novel general-purpose test framework for reliable and large-scale generation of tests for LLMs, a benchmark dataset with 30,000 tests for detecting cognitive biases in LLMs, and a comprehensive assessment of the biases found in the 20 evaluated LLMs. Our work confirms and broadens previous findings suggesting the presence of cognitive biases in LLMs by reporting evidence of all 30 tested biases in at least some of the 20 LLMs. We publish our framework code to encourage future research on biases in LLMs: https://github.com/simonmalberg/cognitive-biases-in-llms
Problem

Research questions and friction points this paper is trying to address.

Evaluates cognitive biases in large language models
Develops test framework for detecting model biases
Assesses 30 biases across 20 LLMs systematically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel test framework for large-scale LLM evaluation
Benchmark dataset with 30,000 cognitive bias tests
Comprehensive assessment of 30 biases across 20 LLMs
🔎 Similar Papers
No similar papers found.
Simon Malberg
Simon Malberg
Technical University of Munich
Artificial IntelligenceMachine LearningNatural Language Processing
R
Roman Poletukhin
School of Computation, Information and Technology, Technical University of Munich, Germany
C
Carolin M. Schuster
School of Computation, Information and Technology, Technical University of Munich, Germany
Georg Groh
Georg Groh
Adjunct Professor
Social ComputingNatural Language Processing