MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-driven general-purpose deep research agents face dual bottlenecks in healthcare: sparse medical knowledge impairs clinical reasoning, and the absence of domain-specific retrieval tools hinders performance on complex medical benchmarks. This paper introduces a knowledge-guided trajectory synthesis framework to build an expert-level medical deep researcher. First, it synthesizes long-chain, multi-hop medical QA data grounded in a curated medical knowledge graph. Second, it develops a dedicated private medical retrieval engine enabling domain-customized tool orchestration and multi-source information fusion. Third, it employs a two-stage training paradigm: supervised fine-tuning followed by online reinforcement learning with a composite reward function. The framework achieves state-of-the-art performance across multiple medical benchmarks—outperforming large proprietary systems—while preserving general research capabilities. It generates over 2,100 high-quality reasoning trajectories spanning 12 medical specialties.

Technology Category

Application Category

📝 Abstract
Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges, as evidenced by leading proprietary systems achieving limited accuracy on complex medical benchmarks. The key limitations are: (1) the model lacks sufficient dense medical knowledge for clinical reasoning, and (2) the framework is constrained by the absence of specialized retrieval tools tailored for medical contexts.We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting the longest chains from subgraphs around rare medical entities to generate complex multi-hop question-answer pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2100+ diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions.Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our MedResearcher-R1-32B model demonstrates exceptional performance, establishing new state-of-the-art results on medical benchmarks while maintaining competitive performance on general deep research tasks. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains.
Problem

Research questions and friction points this paper is trying to address.

Addresses medical domain limitations in deep research agents
Overcomes insufficient medical knowledge for clinical reasoning
Integrates specialized retrieval tools for medical information synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Medical knowledge graph data synthesis framework
Custom private medical retrieval engine integration
Two-stage training with fine-tuning and reinforcement learning
🔎 Similar Papers
No similar papers found.
A
Ailing Yu
Ant Group
Lan Yao
Lan Yao
Harbin Institute of Technology
J
Jingnan Liu
Ant Group
Z
Zhe Chen
Ant Group
J
Jiajun Yin
Ant Group
Y
Yuan Wang
Ant Group
X
Xinhao Liao
Ant Group
Z
Zhiling Ye
Ant Group
Ji Li
Ji Li
Principal Group Science Manager at Microsoft
AICAD
Yun Yue
Yun Yue
Ant Group
AImachine learning
H
Hansong Xiao
Ant Group
H
Hualei Zhou
Ant Group
C
Chunxiao Guo
Ant Group
P
Peng Wei
Ant Group
Jinjie Gu
Jinjie Gu
ant group
机器学习,推荐