🤖 AI Summary
To address the high manual cost and difficulty in ensuring diversity when constructing multiple functionally equivalent program versions in N-version programming, this paper proposes the first automated framework integrating large language model (LLM)-based generation with formal equivalence verification. Methodologically, it leverages LLMs to generate semantically equivalent yet structurally diverse program variants across languages (e.g., C), employs SMT solvers (Z3) for semantic equivalence checking, and introduces a quantitative diversity metric to guide generation—ensuring both static structural divergence in compiled binaries and behavioral diversity at runtime. The key contribution is the first deep integration of LLM-driven synthesis with formal verification, enabling end-to-end construction of fault-tolerant binary components. Experiments demonstrate that the framework successfully detects and masks real-world miscompilation bugs in Clang, significantly advancing automation and reliability assurance in N-version systems.
📝 Abstract
N-Version Programming is a well-known methodology for developing fault-tolerant systems. It achieves fault detection and correction at runtime by adding diverse redundancy into programs, minimizing fault mode overlap between redundant program variants. In this work, we propose the automated generation of program variants using large language models. We design, develop and evaluate Gal'apagos: a tool for generating program variants using LLMs, validating their correctness and equivalence, and using them to assemble N-Version binaries. We evaluate Gal'apagos by creating N-Version components of real-world C code. Our original results show that Gal'apagos can produce program variants that are proven to be functionally equivalent, even when the variants are written in a different programming language. Our systematic diversity measurement indicates that functionally equivalent variants produced by Gal'apagos, are statically different after compilation, and present diverging internal behavior at runtime. We demonstrate that the variants produced by Gal'apagos can protect C code against real miscompilation bugs which affect the Clang compiler. Overall, our paper shows that producing N-Version software can be drastically automated by advanced usage of practical formal verification and generative language models.