🤖 AI Summary
Mathematical optimization models generated by large language models (LLMs) often suffer from unverifiable semantic correctness and constraint consistency. Method: This paper proposes the first multi-agent automated verification framework specifically designed for optimization modeling. It innovatively adapts mutation testing—a software engineering technique—to the optimization domain by introducing a domain-specific testing API and orchestrating multiple LLM agents to collaboratively generate test cases, construct model mutants, and assess semantic consistency. The approach integrates LLMs, multi-agent coordination, automated testing API generation, and optimization-aware mutation strategies, while leveraging test coverage metrics to quantify verification efficacy. Contribution/Results: Experiments demonstrate that the framework achieves significantly higher mutation coverage than baseline methods, effectively detects modeling errors, and substantially enhances the reliability and trustworthiness of LLM-generated optimization models.
📝 Abstract
Recently, using Large Language Models (LLMs) to generate optimization models from natural language descriptions has became increasingly popular. However, a major open question is how to validate that the generated models are correct and satisfy the requirements defined in the natural language description. In this work, we propose a novel agent-based method for automatic validation of optimization models that builds upon and extends methods from software testing to address optimization modeling . This method consists of several agents that initially generate a problem-level testing API, then generate tests utilizing this API, and, lastly, generate mutations specific to the optimization model (a well-known software testing technique assessing the fault detection power of the test suite). In this work, we detail this validation framework and show, through experiments, the high quality of validation provided by this agent ensemble in terms of the well-known software testing measure called mutation coverage.