RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of costly full-repository builds and inefficient execution feedback in repository-level code generation, this paper proposes a sandbox-testing-driven paradigm for lightweight execution environment construction. Our method isolates the target function together with its minimal dependency set, enabling dynamic execution within an isolated sandbox to obtain precise, fine-grained feedback—bypassing the scalability bottlenecks of full-repository compilation. Key components include dependency-aware minimal extraction, automated test script generation, and construction of a large-scale function-level benchmark (RepoST-Train with 7,415 functions and RepoST-Eval). Experiments demonstrate substantial improvements in code model performance: Pass@1 increases by 5.5% on HumanEval and 3.5% on RepoEval. We further conduct systematic evaluation across 12 mainstream models. The proposed infrastructure enables highly scalable, low-coupling execution feedback for repository-level code generation.

Technology Category

Application Category

📝 Abstract
We present RepoST, a scalable method to construct environments that provide execution feedback for repository-level code generation for both training and evaluation. Unlike existing works that aim to build entire repositories for execution, which is challenging for both human and LLMs, we provide execution feedback with sandbox testing, which isolates a given target function and its dependencies to a separate script for testing. Sandbox testing reduces the complexity of external dependencies and enables constructing environments at a large scale. We use our method to construct RepoST-Train, a large-scale train set with 7,415 functions from 832 repositories. Training with the execution feedback provided by RepoST-Train leads to a performance gain of 5.5% Pass@1 on HumanEval and 3.5% Pass@1 on RepoEval. We also build an evaluation dataset, RepoST-Eval, and benchmark 12 code generation models.
Problem

Research questions and friction points this paper is trying to address.

Scalable repository-level code execution feedback
Sandbox testing for isolated function testing
Performance improvement in code generation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sandbox testing isolates functions for execution feedback
Scalable environment construction with reduced dependency complexity
Large-scale training set enhances code generation performance
🔎 Similar Papers
No similar papers found.
Yiqing Xie
Yiqing Xie
Carnegie Mellon University
Natural Language ProcessingCode GenerationFactuality
A
Alex Xie
Carnegie Mellon University
D
Divyanshu Sheth
Carnegie Mellon University
P
Pengfei Liu
Shanghai Jiao Tong University
Daniel Fried
Daniel Fried
Carnegie Mellon University
Natural Language ProcessingMachine Learning
C
Carolyn Rosé
Carnegie Mellon University