Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing repository-level code question answering methods face dual bottlenecks: in-context learning struggles to support environment-aware tool invocation and decision-making, while supervised training relies on large-model distillation—posing data compliance risks. This paper introduces RepoSearch-R1, the first repository-level code QA framework requiring neither external supervision nor model distillation, integrating Monte Carlo Tree Search (MCTS) with self-training reinforcement learning. Its key innovations are: (1) a cold-start self-exploration mechanism that generates high-quality, diverse reasoning trajectories without data dependency; and (2) an MCTS-guided policy that enhances reasoning completeness and training stability. Experiments demonstrate that RepoSearch-R1 improves answer completeness by 16.0% over retrieval-free baselines and by 19.5% over iterative retrieval approaches, while achieving 33% higher training efficiency than generic RL methods.

Technology Category

Application Category

📝 Abstract

Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhancing repository QA agents through reinforcement learning without external supervision

Addressing limitations of training-free and distillation-based code analysis approaches

Improving tool utilization and decision-making in complex codebase navigation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte-carlo Tree Search drives reinforcement learning framework

Self-training generates reasoning trajectories without external supervision

Cold-start training eliminates data compliance concerns while maintaining diversity

🔎 Similar Papers

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study