🤖 AI Summary
Low-efficiency localization of compatibility issues arising from Coq proof assistant upgrades hampers maintainability and developer productivity.
Method: We propose the first automated test-case minimization method tailored to formal proof scenarios, designing and implementing Coq Bug Minimizer—a tool integrated into the coqbot reverse CI pipeline. Diverging from compiler-based approaches, we systematically identify and address proof-assistant-specific challenges: complex dependency structures, semantic sensitivity, and non-local compilation effects. Our framework combines AST traversal, dependency-aware pruning, and incremental compilation validation.
Contribution/Results: Evaluated on 150+ real-world CI failure cases, our tool achieves a 75% minimization success rate; 89% of minimized cases are fully self-contained and compilable. On average, minimized test cases shrink to one-third the original size and compile in 1.25 seconds (75% < 0.5 s). The methodology exhibits strong potential for generalization across other interactive theorem provers.
📝 Abstract
As the adoption of proof assistants increases, there is a need for efficiency in identifying, documenting, and fixing compatibility issues that arise from proof assistant evolution. We present the Coq Bug Minimizer, a tool for reproducing buggy behavior with minimal and standalone files, integrated with coqbot to trigger automatically on Coq reverse CI failures. Our tool eliminates the overhead of having to download, set up, compile, and then explore and understand large developments: enabling Coq developers to easily obtain modular test-case files for fast experimentation. In this paper, we describe insights about how test-case reduction is different in Coq than in traditional compilers. We expect that our insights will generalize to other proof assistants. We evaluate the Coq Bug Minimizer on over 150 CI failures. Our tool succeeds in reducing failures to smaller test cases in roughly 75% of the time. The minimizer produces a fully standalone test case 89% of the time, and it is on average about one-third the size of the original test. The average reduced test case compiles in 1.25 seconds, with 75% taking under half a second.