🤖 AI Summary
Legacy C code in safety-critical domains often lacks formal specifications and exhibits complex features—such as loops and pointer aliasing—that impede rigorous verification.
Method: This paper introduces the first multi-abstraction-level framework for generating function summaries by synergistically integrating symbolic execution (VST-A), deductive verification (Frama-C), and large language models (LLMs). It pioneers automated generation of relatively strongest postconditions (RSPs), combines LLM-driven templated loop invariant inference with domain-specific-language (DSL)-based synthesis of non-redundant postconditions, and employs iterative refinement to ensure soundness and precision.
Results: Evaluated on multiple embedded C benchmarks, our generated summaries achieve 100% formal correctness—fully satisfying verification obligations—while simultaneously attaining both the precision required for mechanized proof and the readability essential for human comprehension. The approach significantly outperforms existing state-of-the-art methods in both correctness guarantees and usability.
📝 Abstract
Function summaries, which characterize the behavior of code segments (typically functions) through preconditions and postconditions, are essential for understanding, reusing, and verifying software, particularly in safety-critical domains like aerospace embedded systems. However, these mission-critical legacy code serving as a valuable reused asset often lacks formal specifications. It is challenging to automatically generate function summaries for C programs, due to the existence of complex features such as loops, nested function calls, pointer aliasing, and so on. Moreover, function summaries should support multiple abstraction levels to meet diverse requirements, e.g. precise summaries capturing full functionality for formal verification and intuitive summaries for human understanding. To address these challenges, we first propose a novel framework that combines symbolic execution, large language models (LLMs), and formal verification to generate Relatively Strongest Postconditions (RSPs) and build function summaries that fully capture program behavior. Our approach leverages VST-A's symbolic execution to precisely track program execution paths and state transitions, employs LLMs to infer loop invariants based on predefined templates, and uses Frama-C to guarantee soundness of generated summaries in an iterative refinement loop. Furthermore, from generated RSPs, we automatically synthesize strongest non-redundant postconditions expressed within given domain specific language. We compare our approach with existing work through extensive experiments.