Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the problem of inference-time computational resource allocation: how to optimally trade off sequential scaling (extending a single chain-of-thought, CoT) versus parallel scaling (generating multiple short chains and aggregating via majority voting) to enhance LLM reasoning performance. Methodologically, it integrates theoretical graph-theoretic analysis with large-scale empirical evaluation, introduces customized CoT training, dynamic inference scheduling, and distribution-robustness assessment. Its key contribution is the first dual-theoretical-and-empirical demonstration that, on structured reasoning tasks—particularly those governed by graph connectivity—long-chain CoT yields exponential performance gains over conventional parallel voting, fundamentally challenging the latter’s dominance. Experiments across multiple LLMs confirm that long-chain CoT achieves order-of-magnitude accuracy improvements on challenging graph-distribution benchmarks, with consistent scalability and robust generalization as model size increases.

Technology Category

Application Category

📝 Abstract

Inference-time computation has emerged as a promising scaling axis for improving large language model reasoning. However, despite yielding impressive performance, the optimal allocation of inference-time computation remains poorly understood. A central question is whether to prioritize sequential scaling (e.g., longer chains of thought) or parallel scaling (e.g., majority voting across multiple short chains of thought). In this work, we seek to illuminate the landscape of test-time scaling by demonstrating the existence of reasoning settings where sequential scaling offers an exponential advantage over parallel scaling. These settings are based on graph connectivity problems in challenging distributions of graphs. We validate our theoretical findings with comprehensive experiments across a range of language models, including models trained from scratch for graph connectivity with different chain of thought strategies as well as large reasoning models.

Problem

Research questions and friction points this paper is trying to address.

Optimal allocation of inference-time computation in language models

Comparing sequential vs parallel scaling for reasoning performance

Exponential advantage of long chains-of-thought in graph problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prioritize sequential scaling over parallel scaling

Use long chains of thought for reasoning

Validate with graph connectivity experiments

🔎 Similar Papers

No similar papers found.