🤖 AI Summary
To address the challenges of labor-intensive customization and complex API usage in static code analysis tools, this paper proposes an LLM-based automated checker generation method. Given only natural-language rule specifications and a small set of test cases, the method employs a novel incremental logical refinement mechanism and sub-operation-driven fine-grained API context retrieval to precisely align inspection logic with framework APIs. Integrating test-driven development (TDD) with iterative code generation, it achieves an average test pass rate of 82.28% across 20 PMD rules—significantly outperforming baseline approaches. Generated checkers exhibit runtime performance on par with official implementations in real-world projects. This work establishes a lightweight, efficient, and interpretable paradigm for custom static analysis, reducing reliance on domain expertise while ensuring correctness and maintainability.
📝 Abstract
With the rising demand for code quality assurance, developers are not only utilizing existing static code checkers but also seeking custom checkers to satisfy their specific needs. Nowadays, various code-checking frameworks provide extensive checker customization interfaces to meet this need. However, both the abstract checking logic and the complex API usage of large-scale checker frameworks make this task challenging. To this end, automated code checker generation is anticipated to ease the burden of checker development. In this paper, we propose AutoChecker, an innovative LLM-powered approach that can write code checkers automatically based on only a rule description and a test suite. To achieve comprehensive checking logic, AutoChecker incrementally updates the checker's logic by focusing on solving one selected case each time. To obtain precise API knowledge, during each iteration, it leverages fine-grained logic-guided API-context retrieval, where it first decomposes the checking logic into a series of sub-operations and then retrieves checker-related API-contexts for each sub-operation. For evaluation, we apply AutoChecker, five baselines, and three ablation methods using multiple LLMs to generate checkers for 20 randomly selected PMD rules. Experimental results show that AutoChecker significantly outperforms others across all effectiveness metrics, with an average test pass rate of 82.28%. Additionally, the checkers generated by AutoChecker can be successfully applied to real-world projects, matching the performance of official checkers.