🤖 AI Summary
Existing repository-level code question answering methods face dual bottlenecks: in-context learning struggles to support environment-aware tool invocation and decision-making, while supervised training relies on large-model distillation—posing data compliance risks. This paper introduces RepoSearch-R1, the first repository-level code QA framework requiring neither external supervision nor model distillation, integrating Monte Carlo Tree Search (MCTS) with self-training reinforcement learning. Its key innovations are: (1) a cold-start self-exploration mechanism that generates high-quality, diverse reasoning trajectories without data dependency; and (2) an MCTS-guided policy that enhances reasoning completeness and training stability. Experiments demonstrate that RepoSearch-R1 improves answer completeness by 16.0% over retrieval-free baselines and by 19.5% over iterative retrieval approaches, while achieving 33% higher training efficiency than generic RL methods.
📝 Abstract
Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.