SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing approaches to code repair using open-source large language models (LLMs) suffer from weak generalization, inefficient utilization of open resources, and—when relying on commercial models—high cost and privacy risks. Method: This paper proposes a subtask-oriented reinforcement fine-tuning (SoRFT) paradigm that decomposes code repair into four sequential stages: file localization, function localization, line localization, and edit generation. It integrates truth-filtered supervised fine-tuning—incorporating chain-of-thought data curation and rejection sampling—with rule-driven Proximal Policy Optimization (PPO) reinforcement learning, where rewards are explicitly grounded in ground-truth correctness. Contribution/Results: The resulting model, SoRFT-Qwen-7B, achieves a 21.4% pass rate on SWE-Bench Verified, setting a new state-of-the-art for open-source models. It demonstrates significantly improved generalization and inference efficiency, validating its viability as a cost-effective, privacy-preserving alternative to proprietary models.

Technology Category

Application Category

📝 Abstract

Mainstream issue-resolving frameworks predominantly rely on commercial models, leading to high costs and privacy concerns. Existing training approaches for issue resolving struggle with poor generalization and fail to fully leverage open-source development resources. We propose Subtask-oriented Reinforced Fine-Tuning (SoRFT), a novel training approach to enhance the issue resolving capability of LLMs. We decomposes issue resolving into structured subtasks: file localization, function localization, line localization, and code edit generation. SoRFT consists of two training stages: (1) rejection-sampled supervised fine-tuning, Chain of Thought (CoT) data is filtered using ground-truth before fine-tuning the LLM, and (2) rule-based reinforcement learning, which leverages PPO with ground-truth based rewards. We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite, achieving state-of-the-art (SOTA) performance among open-source models (e.g., resolve 21.4% issues on SWE-Bench Verified with SoRFT-Qwen-7B). The experimental results demonstrate that SoRFT significantly enhances issue-resolving performance, improves model generalization, and provides a cost-efficient alternative to commercial models.

Problem

Research questions and friction points this paper is trying to address.

Enhances issue-resolving capability of LLMs

Improves model generalization through subtask decomposition

Provides cost-efficient alternative to commercial models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes issue resolving into structured subtasks

Utilizes rejection-sampled supervised fine-tuning

Implements rule-based reinforcement learning with PPO

🔎 Similar Papers

No similar papers found.