StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-agent reinforcement learning (MARL) evaluation protocols rely on fixed, built-in AI opponents, resulting in limited diversity and poor generalizability of assessments. To address this, we propose the StarCraft II Battle Arena (SC2BA)—the first open, algorithm-vs-algorithm fair-competition benchmark for MARL. SC2BA introduces two novel adversarial paradigms: “dual-algorithm pairing” and “multi-algorithm mixing”, enabling systematic diagnosis of efficacy, robustness, and scalability across state-of-the-art MARL methods. Built upon StarCraft II, SC2BA is complemented by the open-source Adversarial PyMARL (APyMARL) library, which provides flexible interfaces and customizable policy-behavior control. We conduct cross-scenario adversarial evaluations of 12 representative MARL algorithms, identifying critical performance bottlenecks—including sensitivity to opponent diversity, brittle credit assignment under mixed policies, and scalability degradation with increasing agent count. All code, configurations, and reproducibility scripts are publicly released.

Technology Category

Application Category

📝 Abstract
Deep multi-agent reinforcement learning (MARL) algorithms are booming in the field of collaborative intelligence, and StarCraft multi-agent challenge (SMAC) is widely-used as the benchmark therein. However, imaginary opponents of MARL algorithms are practically configured and controlled in a fixed built-in AI mode, which causes less diversity and versatility in algorithm evaluation. To address this issue, in this work, we establish a multi-agent algorithm-vs-algorithm environment, named StarCraft II battle arena (SC2BA), to refresh the benchmarking of MARL algorithms in an adversary paradigm. Taking StarCraft as infrastructure, the SC2BA environment is specifically created for inter-algorithm adversary with the consideration of fairness, usability and customizability, and meantime an adversarial PyMARL (APyMARL) library is developed with easy-to-use interfaces/modules. Grounding in SC2BA, we benchmark those classic MARL algorithms in two types of adversarial modes: dual-algorithm paired adversary and multi-algorithm mixed adversary, where the former conducts the adversary of pairwise algorithms while the latter focuses on the adversary to multiple behaviors from a group of algorithms. The extensive benchmark experiments exhibit some thought-provoking observations/problems in the effectivity, sensibility and scalability of these completed algorithms. The SC2BA environment as well as reproduced experiments are released in href{https://github.com/dooliu/SC2BA}{Github}, and we believe that this work could mark a new step for the MARL field in the coming years.
Problem

Research questions and friction points this paper is trying to address.

Establishes a multi-agent algorithm-vs-algorithm environment for adversarial benchmarking
Addresses lack of diversity in opponent AI for evaluating MARL algorithms
Benchmarks classic MARL algorithms in dual and multi-algorithm adversarial modes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Establishes algorithm-vs-algorithm environment for adversary benchmarking
Develops adversarial PyMARL library with easy-to-use interfaces
Benchmarks algorithms in dual-algorithm and multi-algorithm adversarial modes
Y
Yadong Li
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
T
Tong Zhang
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
B
Bo Huang
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Zhen Cui
Beijing Normal University
Pattern Recognition and Computer Vision