🤖 AI Summary
This work addresses the low quality and weak expressiveness of automatically generated class invariants in C++. We propose, for the first time, a large language model (LLM)-based collaborative synthesis method that jointly generates executable class invariants and corresponding test inputs. Our approach integrates functional LLM-based synthesis, executable specification modeling, test-driven verification, and mutation analysis. We further construct the first C++-specific class invariant benchmark suite and an evaluation framework grounded in testing and mutation analysis. Experimental results demonstrate that our method significantly outperforms both pure LLM baselines and Daikon on standard C++ data structures. Moreover, it successfully synthesizes semantically precise, practical, and robust invariants across multiple high-integrity industrial C++ codebases, validating its real-world applicability and reliability.
📝 Abstract
Formal program specifications in the form of preconditions, postconditions, and class invariants have several benefits for the construction and maintenance of programs. They not only aid in program understanding due to their unambiguous semantics but can also be enforced dynamically (or even statically when the language supports a formal verifier). However, synthesizing high-quality specifications in an underlying programming language is limited by the expressivity of the specifications or the need to express them in a declarative manner. Prior work has demonstrated the potential of large language models (LLMs) for synthesizing high-quality method pre/postconditions for Python and Java, but does not consider class invariants. In this work, we describe ClassInvGen, a method for co-generating executable class invariants and test inputs to produce high-quality class invariants for a mainstream language such as C++, leveraging LLMs' ability to synthesize pure functions. We show that ClassInvGen outperforms a pure LLM-based technique to generate specifications (from code) as well as prior data-driven invariant inference techniques such as Daikon. We contribute a benchmark of standard C++ data structures along with a harness that can help measure both the correctness and completeness of generated specifications using tests and mutants. We also demonstrate its applicability to real-world code by performing a case study on several classes within a widely used and high-integrity C++ codebase.