Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

๐Ÿ“… 2026-06-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses a key limitation of traditional retrieval-augmented generation (RAG) systems, which rely on semantic similarity and thus struggle with complex tasks requiring analogical reasoningโ€”where superficially similar problems may demand divergent solutions, while dissimilar ones can share underlying reasoning patterns. To overcome this, the authors propose the RA-RFT framework, which introduces a novel retrieval mechanism based on reasoning utility and trains a reasoning-aware retriever via gold relevance distillation. The approach further incorporates diverse analogical examples as complementary reasoning scaffolds and optimizes reasoning trajectories through retrieval-augmented reinforcement fine-tuning guided by verifiable outcome rewards. Evaluated on multiple mathematical reasoning benchmarks, RA-RFT substantially outperforms existing methods, achieving accuracy gains of 7.1 and 2.8 percentage points for Qwen3-1.7B and Qwen3-4B respectively on AIME 2025 under @32 settings.
๐Ÿ“ Abstract
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train a retriever that ranks contexts by expected reasoning benefit rather than semantic overlap, and then fine-tunes the policy model via reinforcement fine-tuning methods with retrieved analogous demonstrations, so the model learns to leverage reasoning traces under verifiable outcome rewards. We further analyze the diversity of retrieved contexts and find that reasoning-aware retrieval surfaces complementary solution strategies that provide distinct reasoning scaffolds for individual problems. Across challenging mathematical reasoning benchmarks, RA-RFT consistently outperforms standard reinforcement fine-tuning methods. For example, it improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively -- suggesting that reasoning-aware retrieval is a complementary axis of improvement and orthogonal to advances in reward design or training curricula.
Problem

Research questions and friction points this paper is trying to address.

reasoning by analogy
retrieval-augmented generation
complex reasoning
retrieval relevance
mathematical reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Reinforcement Fine-Tuning
reasoning by analogy
gold-relevance distillation
reasoning-aware retrieval
reinforcement fine-tuning
๐Ÿ”Ž Similar Papers