๐ค AI Summary
This work addresses a key limitation of traditional retrieval-augmented generation (RAG) systems, which rely on semantic similarity and thus struggle with complex tasks requiring analogical reasoningโwhere superficially similar problems may demand divergent solutions, while dissimilar ones can share underlying reasoning patterns. To overcome this, the authors propose the RA-RFT framework, which introduces a novel retrieval mechanism based on reasoning utility and trains a reasoning-aware retriever via gold relevance distillation. The approach further incorporates diverse analogical examples as complementary reasoning scaffolds and optimizes reasoning trajectories through retrieval-augmented reinforcement fine-tuning guided by verifiable outcome rewards. Evaluated on multiple mathematical reasoning benchmarks, RA-RFT substantially outperforms existing methods, achieving accuracy gains of 7.1 and 2.8 percentage points for Qwen3-1.7B and Qwen3-4B respectively on AIME 2025 under @32 settings.
๐ Abstract
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train a retriever that ranks contexts by expected reasoning benefit rather than semantic overlap, and then fine-tunes the policy model via reinforcement fine-tuning methods with retrieved analogous demonstrations, so the model learns to leverage reasoning traces under verifiable outcome rewards. We further analyze the diversity of retrieved contexts and find that reasoning-aware retrieval surfaces complementary solution strategies that provide distinct reasoning scaffolds for individual problems. Across challenging mathematical reasoning benchmarks, RA-RFT consistently outperforms standard reinforcement fine-tuning methods. For example, it improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively -- suggesting that reasoning-aware retrieval is a complementary axis of improvement and orthogonal to advances in reward design or training curricula.