Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing repository-level code question answering methods face dual bottlenecks: in-context learning struggles to support environment-aware tool invocation and decision-making, while supervised training relies on large-model distillation—posing data compliance risks. This paper introduces RepoSearch-R1, the first repository-level code QA framework requiring neither external supervision nor model distillation, integrating Monte Carlo Tree Search (MCTS) with self-training reinforcement learning. Its key innovations are: (1) a cold-start self-exploration mechanism that generates high-quality, diverse reasoning trajectories without data dependency; and (2) an MCTS-guided policy that enhances reasoning completeness and training stability. Experiments demonstrate that RepoSearch-R1 improves answer completeness by 16.0% over retrieval-free baselines and by 19.5% over iterative retrieval approaches, while achieving 33% higher training efficiency than generic RL methods.

Technology Category

Application Category

📝 Abstract
Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing repository QA agents through reinforcement learning without external supervision
Addressing limitations of training-free and distillation-based code analysis approaches
Improving tool utilization and decision-making in complex codebase navigation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte-carlo Tree Search drives reinforcement learning framework
Self-training generates reasoning trajectories without external supervision
Cold-start training eliminates data compliance concerns while maintaining diversity
🔎 Similar Papers
No similar papers found.
G
Guochang Li
Zhejiang University, China
Y
Yuchen Liu
Alibaba Group, China
Z
Zhen Qin
Zhejiang University, China
Yunkun Wang
Yunkun Wang
Zhejiang University
AI4SECode Generation
J
Jianping Zhong
Zhejiang University, China
C
Chen Zhi
Zhejiang University, China
B
Binhua Li
Alibaba Group, China
F
Fei Huang
Alibaba Group, China
Y
Yongbin Li
Alibaba Group, China
S
Shuiguang Deng
Zhejiang University, China