🤖 AI Summary
Existing multi-agent reinforcement learning (MARL) evaluation protocols rely on fixed, built-in AI opponents, resulting in limited diversity and poor generalizability of assessments. To address this, we propose the StarCraft II Battle Arena (SC2BA)—the first open, algorithm-vs-algorithm fair-competition benchmark for MARL. SC2BA introduces two novel adversarial paradigms: “dual-algorithm pairing” and “multi-algorithm mixing”, enabling systematic diagnosis of efficacy, robustness, and scalability across state-of-the-art MARL methods. Built upon StarCraft II, SC2BA is complemented by the open-source Adversarial PyMARL (APyMARL) library, which provides flexible interfaces and customizable policy-behavior control. We conduct cross-scenario adversarial evaluations of 12 representative MARL algorithms, identifying critical performance bottlenecks—including sensitivity to opponent diversity, brittle credit assignment under mixed policies, and scalability degradation with increasing agent count. All code, configurations, and reproducibility scripts are publicly released.
📝 Abstract
Deep multi-agent reinforcement learning (MARL) algorithms are booming in the field of collaborative intelligence, and StarCraft multi-agent challenge (SMAC) is widely-used as the benchmark therein. However, imaginary opponents of MARL algorithms are practically configured and controlled in a fixed built-in AI mode, which causes less diversity and versatility in algorithm evaluation. To address this issue, in this work, we establish a multi-agent algorithm-vs-algorithm environment, named StarCraft II battle arena (SC2BA), to refresh the benchmarking of MARL algorithms in an adversary paradigm. Taking StarCraft as infrastructure, the SC2BA environment is specifically created for inter-algorithm adversary with the consideration of fairness, usability and customizability, and meantime an adversarial PyMARL (APyMARL) library is developed with easy-to-use interfaces/modules. Grounding in SC2BA, we benchmark those classic MARL algorithms in two types of adversarial modes: dual-algorithm paired adversary and multi-algorithm mixed adversary, where the former conducts the adversary of pairwise algorithms while the latter focuses on the adversary to multiple behaviors from a group of algorithms. The extensive benchmark experiments exhibit some thought-provoking observations/problems in the effectivity, sensibility and scalability of these completed algorithms. The SC2BA environment as well as reproduced experiments are released in href{https://github.com/dooliu/SC2BA}{Github}, and we believe that this work could mark a new step for the MARL field in the coming years.