Improving Parametric Knowledge Access in Reasoning Language Models

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies and empirically validates a critical limitation in current reasoning language models: their inefficient utilization of parametric world knowledge due to the absence of effective internal reasoning mechanisms for enhancing knowledge recall. To address this, the authors propose a reinforcement learning–based training approach that leverages verifiable answer correctness as a reward signal to guide the model in generating high-quality reasoning chains when answering world knowledge questions. Integrating this method with chain-of-thought prompting yields substantial performance gains across multiple open-domain question-answering benchmarks—specifically, a 9.9% improvement on TriviaQA, along with gains of 4.2%, 2.1%, 0.6%, and 3.0% on Natural Questions, HotpotQA, SimpleQA, and StrategyQA, respectively—demonstrating a significantly enhanced ability to effectively access and reason over internal knowledge.

Technology Category

Application Category

📝 Abstract
We study reasoning for accessing world knowledge stored in a language model's parameters. For example, recalling that Canberra is Australia's capital may benefit from thinking through major cities and the concept of purpose-built capitals. While reasoning language models are trained via reinforcement learning to produce reasoning traces on tasks such as mathematics, they may not reason well for accessing their own world knowledge. We first find that models do not generate their best world knowledge reasoning by default: adding a simple "think step-by-step" cue demonstrates statistically significant improvement in knowledge recall but not math. Motivated by this, we propose training models to reason over their parametric knowledge using world-knowledge question answering as a verifiable reward. After reinforcement learning on TriviaQA (+9.9%), performance also improves on Natural Questions, HotpotQA, SimpleQA, and StrategyQA by 4.2%, 2.1%, 0.6%, and 3.0%, respectively. Reasoning models are under-optimized for parametric knowledge access, but can be easily trained to reason better.
Problem

Research questions and friction points this paper is trying to address.

reasoning language models
parametric knowledge access
world knowledge recall
knowledge reasoning
language model parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

parametric knowledge access
reasoning language models
reinforcement learning
knowledge recall
step-by-step reasoning