Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Prior work treats long-context capability primarily as an input-adaptation requirement, overlooking its intrinsic relationship with reasoning performance. Method: We conduct controlled experiments—comparing models with varying long-context capacities under identical architectures and supervised fine-tuning (SFT) data—to isolate the causal effect of long-context modeling on reasoning. Contribution/Results: We establish that long-context modeling is a foundational capability underlying reasoning. Enhancing long-context capacity consistently improves performance across major reasoning benchmarks (e.g., GSM8K, MMLU, BBH), with gains robustly generalizing to short-context tasks. This is the first empirical demonstration that long-context capability confers length-agnostic reasoning benefits—transcending the conventional “context-window expansion” paradigm—and provides a novel perspective on reasoning-capability modeling.

Technology Category

Application Category

📝 Abstract

Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. In this work, we hypothesize that current limitations in reasoning stem, in part, from insufficient long-context capacity, motivated by empirical observations such as (1) higher context window length often leads to stronger reasoning performance, and (2) failed reasoning cases resemble failed long-context cases. To test this hypothesis, we examine whether enhancing a model's long-context ability before Supervised Fine-Tuning (SFT) leads to improved reasoning performance. Specifically, we compared models with identical architectures and fine-tuning data but varying levels of long-context capacity. Our results reveal a consistent trend: models with stronger long-context capacity achieve significantly higher accuracy on reasoning benchmarks after SFT. Notably, these gains persist even on tasks with short input lengths, indicating that long-context training offers generalizable benefits for reasoning performance. These findings suggest that long-context modeling is not just essential for processing lengthy inputs, but also serves as a critical foundation for reasoning. We advocate for treating long-context capacity as a first-class objective in the design of future language models.

Problem

Research questions and friction points this paper is trying to address.

Investigates long-context capacity's impact on reasoning performance

Tests if enhancing long-context ability improves reasoning after fine-tuning

Shows long-context training benefits reasoning even in short inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhancing long-context ability before SFT

Comparing models with varying context capacities

Long-context training improves reasoning generally

🔎 Similar Papers

Long-context Language Models Cannot Retrieve Without Sufficient Steps