Improving Merge Pipeline Throughput in Continuous Integration via Pull Request Prioritization

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

In large monorepos, CI merge pipelines under high load become integration bottlenecks, severely impeding development velocity. To address this, we propose a build-system-agnostic optimization framework: leveraging historical build logs, PR metadata, and contextual features, we train a lightweight model to predict per-PR build success probability; based on these predictions, we dynamically prioritize pull requests—scheduling high-probability requests first during peak loads. Our approach requires no modifications to underlying CI infrastructure, ensuring strong integrability and deployment feasibility. Evaluated on a real-world large-scale production monorepo, it achieves significantly higher throughput compared to FIFO and non-learning baselines, effectively alleviating CI integration bottlenecks. The method provides a practical, scalable optimization pathway for high-concurrency software delivery without compromising system compatibility or operational simplicity.

Technology Category

Application Category

📝 Abstract

Integrating changes into large monolithic software repositories is a critical step in modern software development that substantially impacts the speed of feature delivery, the stability of the codebase, and the overall productivity of development teams. To ensure the stability of the main branch, many organizations use merge pipelines that test software versions before the changes are permanently integrated. However, the load on merge pipelines is often so high that they become bottlenecks, despite the use of parallelization. Existing optimizations frequently rely on specific build systems, limiting their generalizability and applicability. In this paper we propose to optimize the order of PRs in merge pipelines using practical build predictions utilizing only historical build data, PR metadata, and contextual information to estimate the likelihood of successful builds in the merge pipeline. By dynamically prioritizing likely passing PRs during peak hours, this approach maximizes throughput when it matters most. Experiments conducted on a real-world, large-scale project demonstrate that predictive ordering significantly outperforms traditional first-in-first-out (FIFO), as well as non-learning-based ordering strategies. Unlike alternative optimizations, this approach is agnostic to the underlying build system and thus easily integrable into existing automated merge pipelines.

Problem

Research questions and friction points this paper is trying to address.

Optimizing PR order in merge pipelines to reduce bottlenecks

Predicting build success using historical data and PR metadata

Increasing merge pipeline throughput during peak hours

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prioritize PRs using historical build data

Dynamic ordering based on build success likelihood

Build system agnostic merge pipeline optimization

🔎 Similar Papers

No similar papers found.

Bosch Group

Stuttgart, Germany

Software Engineer