Roseau: Fast, Accurate, Source-based API Breaking Change Analysis in Java

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing binary-based Java API breaking change (BC) detection tools (e.g., JApiCmp, Revapi) rely on compiled JARs, limiting scalability for large-scale longitudinal evolution studies and commit-level fine-grained analysis. This paper proposes a language-agnostic, semantics-enhanced API model that jointly leverages source-code parsing and bytecode analysis to enable cross-version and commit-level BC detection. The approach achieves both high precision and scalability: it attains an F1-score of 0.99 across 60 widely used Java libraries; inter-version analysis completes in under two seconds, while commit-level evolutionary analysis scales to the minute level. Its core innovation lies in introducing semantics-aware modeling—grounded in program structure and behavior—to Java API evolution analysis for the first time, thereby eliminating reliance on binary artifacts. This establishes a novel paradigm for large-scale, fine-grained API evolution research.

Technology Category

Application Category

📝 Abstract
Understanding API evolution and the introduction of breaking changes (BCs) in software libraries is essential for library maintainers to manage backward compatibility and for researchers to conduct empirical studies on software library evolution. In Java, tools such as JApiCmp and Revapi are commonly used to detect BCs between library releases, but their reliance on binary JARs limits their applicability. This restriction hinders large-scale longitudinal studies of API evolution and fine-grained analyses such as commit-level BC detection. In this paper, we introduce Roseau, a novel static analysis tool that constructs technology-agnostic API models from library code equipped with rich semantic analyses. API models can be analyzed to study API evolution and compared to identify BCs between any two versions of a library (releases, commits, branches, etc.). Unlike traditional approaches, Roseau can build API models from source code or bytecode, and is optimized for large-scale longitudinal analyses of library histories. We assess the accuracy, performance, and suitability of Roseau for longitudinal studies of API evolution, using JApiCmp and Revapi as baselines. We extend and refine an established benchmark of BCs and show that Roseau achieves higher accuracy (F1 = 0.99) than JApiCmp (F1 = 0.86) and Revapi (F1 = 0.91). We analyze 60 popular libraries from Maven Central and find that Roseau delivers excellent performance, detecting BCs between versions in under two seconds, including in libraries with hundreds of thousands of lines of code. We further illustrate the limitations of JApiCmp and Revapi for longitudinal studies and the novel analysis capabilities offered by Roseau by tracking the evolution of Google's Guava API and the introduction of BCs over 14 years and 6,839 commits, reducing analysis times from a few days to a few minutes.
Problem

Research questions and friction points this paper is trying to address.

Detects API breaking changes in Java libraries accurately
Enables large-scale longitudinal API evolution studies efficiently
Supports source and bytecode analysis for fine-grained detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-based API models with semantic analyses
Supports both source code and bytecode input
Optimized for large-scale longitudinal API studies
🔎 Similar Papers
No similar papers found.