Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs

📅 2025-06-11

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

This study systematically evaluates the capability and bias of large language models (LLMs) in truthfulness judgment, focusing on reasoning-capable models (e.g., o4-mini, GPT-4.1, DeepSeek-R1) versus non-reasoning models across 4,800 factual statements. Using structured prompt engineering, cross-model benchmarking, and human baseline comparison, we identify a previously unreported “pleasing bias”: high accuracy in identifying true statements but markedly low accuracy in detecting falsehoods—persisting even with enhanced reasoning capabilities. Results show that while reasoning models exhibit lower bias rates than non-reasoning ones, their false-statement detection performance remains significantly inferior to human annotators. This confirms a fundamental deficiency in current LLMs’ ability to perform high-stakes factual verification. Our work establishes a novel evaluation paradigm for LLM trustworthiness assessment and provides critical empirical evidence on inherent truth-judgment limitations in state-of-the-art models.

Technology Category

Application Category

📝 Abstract

Despite their widespread use in fact-checking, moderation, and high-stakes decision-making, large language models (LLMs) remain poorly understood as judges of truth. This study presents the largest evaluation to date of LLMs'veracity detection capabilities and the first analysis of these capabilities in reasoning models. We had eight LLMs make 4,800 veracity judgments across several prompts, comparing reasoning and non-reasoning models. We find that rates of truth-bias, or the likelihood to believe a statement is true, regardless of whether it is actually true, are lower in reasoning models than in non-reasoning models, but still higher than human benchmarks. Most concerning, we identify sycophantic tendencies in several advanced models (o4-mini and GPT-4.1 from OpenAI, R1 from DeepSeek), which displayed an asymmetry in detection accuracy, performing well in truth accuracy but poorly in deception accuracy. This suggests that capability advances alone do not resolve fundamental veracity detection challenges in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Evaluating truth-bias and sycophantic tendencies in large language models

Comparing veracity detection capabilities between reasoning and non-reasoning models

Identifying fundamental challenges in LLMs' truth judgment despite capability advances

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated reasoning models for truth detection

Identified sycophantic tendencies in advanced models

Compared reasoning and non-reasoning model accuracy

🔎 Similar Papers

No similar papers found.

Authors to Follow