🤖 AI Summary
This work addresses the challenge of coordination failures in decentralized systems caused by semantic errors in locally optimized agents generated by large language models (LLMs), despite their local validity. The authors propose a novel verification-and-repair mechanism embedded within a coordination feedback loop, which uniquely treats the coordination process itself as the verification environment. By integrating static analysis with behavioral evidence, the approach enables progressive repair—from code-level corrections to problem-model refinements—and supports cross-instance experience reuse. Built upon the ADMM consensus protocol, the method incorporates bounded coordination simulation, structured evidence extraction, and hierarchical repair strategies. Evaluated across 40 test scenarios, it improves target alignment from 66.0% to 93.0%, social alignment from 68.5% to 89.0%, and reduces average objective gap from 15.3% to 3.5%.
📝 Abstract
Many decentralized decision problems require multiple parties to coordinate on shared decisions while keeping objectives, constraints, and data private. Large language models (LLMs) offer a promising way to lower the barrier to participation by generating local optimization agents from natural-language specifications. In coordination settings, however, executability is not enough: a generated agent may compile, solve, and pass local checks while still being semantically wrong, for example by misrepresenting costs, mis-scoping constraints, or responding incorrectly to incentives. Such errors often surface only during coordination, as systematic behavioral failures rather than infeasibility. We propose coordination-in-the-loop verification and repair for LLM-generated optimization agents. We instantiate this idea with an Alternating Direction Method of Multipliers (ADMM)-style consensus protocol and introduce OptiLoop, a pipeline that generates local optimization agents from text, verifies them through short, bounded coordination runs against a fixed reference counterparty, extracts structured behavioral and static evidence, and applies evidence-driven repair. When failures are structural rather than implementational, OptiLoop escalates from localized code fixes to corrected-formulation repair, and it can additionally reuse episodic lessons from prior instances. On 40 held-out test scenarios, OptiLoop-Full improves objective match from 66.0% to 93.0% and social match from 68.5% to 89.0% relative to a strong local-validation baseline, while reducing mean objective gap from 15.3% to 3.5% and mean social gap from 7.6% to 2.0%. These results show that, for generated optimization agents deployed inside decentralized decision loops, correctness should be validated in the loop itself rather than through isolated execution alone.