Detecting Call Graph Unsoundness without Ground Truth

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

140K/year

🤖 AI Summary

This study addresses a critical limitation in the evaluation of Java static analysis frameworks, which commonly assumes monotonicity and semantic comparability among algorithms and configurations—an assumption that frequently breaks down in the presence of modern language features such as lambdas and reflection, leading to inconsistent call graphs. By establishing a precision partial order, the authors conduct a large-scale empirical study across four major frameworks—Soot, SootUp, WALA, and Doop—revealing, for the first time, significant intra- and inter-framework semantic gaps in call graph construction. Their findings demonstrate that algorithmic precision rankings become unstable under advanced language constructs, that configurations and algorithms can co-fail in nontrivial ways, and that irreconcilable semantic discrepancies exist between frameworks. These results challenge conventional evaluation paradigms in static analysis and advocate for a new perspective that jointly considers algorithms, configurations, and framework-specific semantics.

Technology Category

Application Category

📝 Abstract

Java static analysis frameworks are commonly compared under the assumption that analysis algorithms and configurations compose monotonically and yield semantically comparable results across tools. In this work, we show that this assumption is fundamentally flawed. We present a large-scale empirical study of semantic consistency within and across four widely used Java static analysis frameworks: Soot, SootUp, WALA, and Doop. Using precision partial orders over analysis algorithms and configurations, we systematically identify violations where increased precision introduces new call-graph edges or amplifies inconsistencies. Our results reveal three key findings. First, algorithmic precision orders frequently break within frameworks due to modern language features such as lambdas, reflection, and native modeling. Second, configuration choices strongly interact with analysis algorithms, producing synergistic failures that exceed the effects of algorithm or configuration changes alone. Third, cross-framework comparisons expose irreconcilable semantic gaps, demonstrating that different frameworks operate over incompatible notions of call-graph ground truth. These findings challenge prevailing evaluation practices in static analysis and highlight the need to reason jointly about algorithms, configurations, and framework semantics when assessing precision and soundness.

Problem

Research questions and friction points this paper is trying to address.

call graph

static analysis

semantic consistency

precision

soundness

Innovation

Methods, ideas, or system contributions that make the work stand out.

call graph unsoundness

static analysis

precision partial orders