ClassInvGen: Class Invariant Synthesis using Large Language Models

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low quality and weak expressiveness of automatically generated class invariants in C++. We propose, for the first time, a large language model (LLM)-based collaborative synthesis method that jointly generates executable class invariants and corresponding test inputs. Our approach integrates functional LLM-based synthesis, executable specification modeling, test-driven verification, and mutation analysis. We further construct the first C++-specific class invariant benchmark suite and an evaluation framework grounded in testing and mutation analysis. Experimental results demonstrate that our method significantly outperforms both pure LLM baselines and Daikon on standard C++ data structures. Moreover, it successfully synthesizes semantically precise, practical, and robust invariants across multiple high-integrity industrial C++ codebases, validating its real-world applicability and reliability.

Technology Category

Application Category

📝 Abstract
Formal program specifications in the form of preconditions, postconditions, and class invariants have several benefits for the construction and maintenance of programs. They not only aid in program understanding due to their unambiguous semantics but can also be enforced dynamically (or even statically when the language supports a formal verifier). However, synthesizing high-quality specifications in an underlying programming language is limited by the expressivity of the specifications or the need to express them in a declarative manner. Prior work has demonstrated the potential of large language models (LLMs) for synthesizing high-quality method pre/postconditions for Python and Java, but does not consider class invariants. In this work, we describe ClassInvGen, a method for co-generating executable class invariants and test inputs to produce high-quality class invariants for a mainstream language such as C++, leveraging LLMs' ability to synthesize pure functions. We show that ClassInvGen outperforms a pure LLM-based technique to generate specifications (from code) as well as prior data-driven invariant inference techniques such as Daikon. We contribute a benchmark of standard C++ data structures along with a harness that can help measure both the correctness and completeness of generated specifications using tests and mutants. We also demonstrate its applicability to real-world code by performing a case study on several classes within a widely used and high-integrity C++ codebase.
Problem

Research questions and friction points this paper is trying to address.

Synthesize class invariants using LLMs
Generate executable invariants for C++
Outperform traditional invariant inference techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

ClassInvGen leverages LLMs
Generates executable class invariants
Outperforms prior specification techniques
🔎 Similar Papers
C
Chuyue Sun
Stanford University, USA
V
Viraj Agashe
Microsoft Research, India
Saikat Chakraborty
Saikat Chakraborty
Microsoft Research, USA
J
Jubi Taneja
Microsoft Research, USA
Clark Barrett
Clark Barrett
Stanford University
Formal MethodsSatisfiability Modulo TheoriesAutomated ReasoningVerificationSecurity
D
David Dill
Stanford University, USA
Xiaokang Qiu
Xiaokang Qiu
Purdue University
LogicAutomated DeductionProgram VerificationProgram Synthesis
S
Shuvendu K. Lahiri
Microsoft Research, USA