Do Biased Models Have Biased Thoughts?

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether the chain-of-thought (CoT) reasoning process of large language models (LLMs) aligns with their biased outputs. Focusing on social biases—including gender and racial bias—we systematically quantify 11 fairness metrics across five mainstream LLMs using CoT prompting. We separately assess bias in intermediate reasoning steps versus final outputs. Results reveal a weak correlation between bias levels in reasoning and output (mean *r* < 0.6, *p* < 0.001), indicating that output bias does not necessarily originate from biased internal reasoning. This challenges the cognitive analogy assumption that “bias stems from thought” and provides the first empirical evidence of *bias decoupling* in LLMs—i.e., bias may predominantly emerge at the output generation stage rather than within the CoT reasoning chain. The finding suggests a paradigm shift for AI fairness interventions: optimizing output-generation mechanisms may be more effective than modifying intermediate reasoning steps.

Technology Category

Application Category

📝 Abstract
The impressive performance of language models is undeniable. However, the presence of biases based on gender, race, socio-economic status, physical appearance, and sexual orientation makes the deployment of language models challenging. This paper studies the effect of chain-of-thought prompting, a recent approach that studies the steps followed by the model before it responds, on fairness. More specifically, we ask the following question: extit{Do biased models have biased thoughts}? To answer our question, we conduct experiments on $5$ popular large language models using fairness metrics to quantify $11$ different biases in the model's thoughts and output. Our results show that the bias in the thinking steps is not highly correlated with the output bias (less than $0.6$ correlation with a $p$-value smaller than $0.001$ in most cases). In other words, unlike human beings, the tested models with biased decisions do not always possess biased thoughts.
Problem

Research questions and friction points this paper is trying to address.

Investigates bias in language models' chain-of-thought processes
Measures correlation between model thoughts and output biases
Assesses fairness across 11 biases in 5 large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses chain-of-thought prompting technique
Analyzes bias in model thoughts and outputs
Tests five large language models
🔎 Similar Papers
No similar papers found.