π€ AI Summary
This work addresses the limitations of traditional black-box API testing, which struggles to generate effective state-dependent call sequences due to a lack of behavioral semantics and suffers from the test oracle problem. The authors propose a model-checking-based systematic testing approach that employs TLA+ to formally model API state evolution and leverages the TLC model checker to perform coverage-guided, breadth-first traversal of the state space. This method mitigates state explosion while producing test sequences with provable coverage guarantees over the behavioral model. To enhance runtime validation, the approach integrates Glacier, an executable first-order logic contract language. Empirical evaluation on the EvoMaster benchmark demonstrates complete state coverage, effective detection of multi-operation interaction bugs, and confirms the methodβs scalability and practical applicability boundaries.
π Abstract
Automated black-box testing of APIs typically relies on interface specifications that define available operations and data schemas, but offer limited or no behavioural semantics. This semantic gap amplifies the test-oracle problem and limits the generation of effective, stateful call sequences. We introduce IcePick, a framework that achieves systematic state-space coverage for API testing by leveraging model checking. IcePick uses TLA+ to formally model API state evolution, employs the TLC model checker to exhaustively explore reachable states, and generates test sequences that provably cover the behavioural model. To mitigate state-space explosion and improve sequence extraction, we introduce a coverage-guided breadth-first traversal of the TLC state-space graph. To address oracle limitations beyond HTTP status codes, we propose Glacier, a first-order logic contract language that enriches API specifications with executable semantic contracts, enabling automated behavioural verification during test execution. We evaluate IcePick on EvoMaster Benchmark systems, demonstrating that model-checking-guided exploration achieves complete state coverage and reveals faults in multi-operation interactions. We also analyse scalability to characterise practical limits and applicability requirements. Overall, IcePick provides reproducible test suites with strong coverage guarantees for critical API-based systems.