When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Long-context language models (LCLMs) support input lengths of hundreds of thousands of tokens, yet naïvely concatenating retrieved documents fails to capture evidence dependencies required for multi-hop reasoning. To address this, we propose Thought Template Augmented LCLMs (ToTAL), which decouples factual grounding from logical inference by explicitly modeling reasoning paths via reusable, iteratively optimizable natural-language thought templates. ToTAL introduces a template caching mechanism driven by training trajectory extraction and natural-language feedback, enabling cross-task and cross-model transfer as well as transparent, interpretable reasoning reuse. Furthermore, it supports knowledge distillation to compact open-source models. On multi-hop QA benchmarks, ToTAL substantially outperforms strong baselines under both retrieval-augmented and retrieval-free settings, demonstrating strong generalization and validating the universality and practical efficacy of its thought templates.

Technology Category

Application Category

📝 Abstract

Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address this gap with thought templates, which recast reasoning as reusable thought caches, derived from prior problem solving traces, structuring how evidence is combined and guiding multi-hop inference with factual documents. To keep these templates effective, we propose an update strategy that iteratively refines templates derived from training data through natural-language feedback. Across diverse benchmarks and LCLM families, our approach delivers consistent gains over strong baselines in both retrieval-based and retrieval-free settings. Furthermore, we show that optimized templates can be distilled into smaller open-source models, demonstrating its broad applicability and transparent reasoning reuse. We refer to our framework as Thought Template Augmented LCLMs (ToTAL).

Problem

Research questions and friction points this paper is trying to address.

Addressing evidence connection failure in long-context language models

Structuring multi-hop inference with reusable thought templates

Refining reasoning templates through iterative feedback updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reusable thought caches structure evidence combination

Iterative template refinement via natural-language feedback

Distilling optimized templates into smaller open-source models

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting