🤖 AI Summary
This work addresses the challenge of strategy evolution stagnation in adversarial multi-agent environments, where dynamic environmental changes often render policy evaluation unreliable. To overcome this, the authors propose the FAMOU framework, which introduces coevolutionary mechanisms into large language model (LLM)-driven, code-level strategy evolution for the first time. FAMOU integrates hierarchical deep evaluation, a weakness-stress mechanism, and dynamic opponent weighting to enable sustained optimization on the 3v3 maritime capture-the-flag simulation platform MCTF 2026. The approach successfully generates novel tactical structures—such as lookahead search and adaptive interception—and demonstrates effective transfer to physical hardware. Experimental results show that FAMOU achieves a state-of-the-art composite score of 0.526 and a 61.7% win rate against unseen opponents, earning first place in the hardware track and third in the simulation track at the AAMAS 2026 competition.
📝 Abstract
Recent advances in LLM-driven code evolution have enabled automated discovery by iteratively generating and improving programs. However, applying these methods to adversarial multi-agent games introduces a fundamental challenge: the evaluation landscape shifts as strategies improve, causing fixed evaluators to become unreliable and evolution to stagnate. We propose three mechanisms to address this challenge: evaluator co-evolution, which incorporates discovered champions into the opponent pool; hierarchical deep evaluation, which replaces noisy few-game scores with statistically reliable assessments; and weakness pressure, which dynamically up-weights the most difficult opponents to break through plateaus. We implement these mechanisms within FAMOU, a framework built upon the same foundation-model code-evolution paradigm as OpenEvolve and ShinkaEvolve. On the MCTF 2026 3v3 maritime capture-the-flag task, FAMOU consistently outperforms both baselines under two backbone LLMs, achieving the highest combined score (0.526) and the best generalization to unseen opponents (61.7% win rate), while ablations confirm that each mechanism contributes to performance. Notably, the LLM mutation process generates tactical structures entirely absent from the seed strategies -- including lookahead search and adaptive interception -- demonstrating that code-level evolution can produce nontrivial algorithmic innovations in adversarial settings. The FAMOU-evolved strategy further achieved 1st place in the hardware round-robin and 3rd in simulation at the AAMAS 2026 MCTF Competition, validating its real-world transferability. The optimized implementation and corresponding evaluation codes developed through our evolutionary process are available at: https://github.com/1xiangliu1/FAMOU-CoEvo