Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit “overthinking” in chain-of-thought (CoT) reasoning—repeatedly verifying correct answers due to self-doubt induced by excessive reliance on input prompts and internal uncertainty. This work is the first to formally attribute overthinking to these dual causes and proposes a lightweight, fine-tuning-free prompting paradigm grounded in problem credibility assessment. Our method employs multi-step prompt engineering: (1) problem validity detection to filter ill-posed or ambiguous queries, followed by (2) conditional concise response generation that suppresses redundant verification steps. Evaluated across three mathematical reasoning benchmarks and four missing-premise datasets, our approach consistently reduces answer length and reasoning steps while improving accuracy across four state-of-the-art reasoning LLMs (e.g., Llama-3-70B-Instruct, Qwen2-72B-Instruct). The results demonstrate robust generalization and offer a principled, efficient pathway toward more trustworthy and computationally economical reasoning.

Technology Category

Application Category

📝 Abstract
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks, largely due to the adoption of Long Chain-of-Thought (Long CoT) reasoning. However, they often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer. Prior work has largely focused on qualitative analyses of overthinking through sample-based observations of long CoTs. In contrast, we present a quantitative analysis of overthinking from the perspective of self-doubt, characterized by excessive token usage devoted to re-verifying already-correct answer. We find that self-doubt significantly contributes to overthinking. In response, we introduce a simple and effective prompting method to reduce the model's over-reliance on input questions, thereby avoiding self-doubt. Specifically, we first prompt the model to question the validity of the input question, and then respond concisely based on the outcome of that evaluation. Experiments on three mathematical reasoning tasks and four datasets with missing premises demonstrate that our method substantially reduces answer length and yields significant improvements across nearly all datasets upon 4 widely-used RLLMs. Further analysis demonstrates that our method effectively minimizes the number of reasoning steps and reduces self-doubt.
Problem

Research questions and friction points this paper is trying to address.

Quantifying overthinking in Long Chain-of-Thought reasoning
Reducing self-doubt in Large Language Models
Minimizing unnecessary reasoning steps in RLLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantitative analysis of overthinking via self-doubt
Prompting method reduces input question reliance
Minimizes reasoning steps and self-doubt effectively
🔎 Similar Papers
No similar papers found.