🤖 AI Summary
Prior work on backdoor attacks against large language model (LLM)-driven multi-agent systems (MAS) remains unexplored, particularly in collaborative settings. Method: This paper identifies a novel attack surface arising from agent coordination and introduces the concept of “distributed backdoors”: malicious primitives are stealthily embedded across multiple tools and activated only upon execution of specific inter-agent collaboration sequences—enabling covert data exfiltration while evading conventional single-agent defenses. Our approach comprises distributed primitive embedding, role-aware task orchestration, and a sandboxed evaluation framework, validated on a newly constructed multi-role collaborative benchmark. Contribution/Results: Experiments demonstrate >95% attack success rate with negligible impact on benign task performance, confirming the attack’s high stealthiness and effectiveness—thereby exposing critical security vulnerabilities in LLM-based MAS architectures.
📝 Abstract
LLM-based multi-agent systems (MAS) demonstrate increasing integration into next-generation applications, but their safety in backdoor attacks remains largely underexplored. However, existing research has focused exclusively on single-agent backdoor attacks, overlooking the novel attack surfaces introduced by agent collaboration in MAS. To bridge this gap, we present the first Distributed Backdoor Attack tailored to MAS. We decompose the backdoor into multiple distributed attack primitives that are embedded within MAS tools. These primitives remain dormant individually but collectively activate only when agents collaborate in a specific sequence, thereby assembling the full backdoor to execute targeted attacks such as data exfiltration. To fully assess this threat, we introduce a benchmark for multi-role collaborative tasks and a sandboxed framework to evaluate. Extensive experiments demonstrate that our attack achieves an attack success rate exceeding 95% without degrading performance on benign tasks. This work exposes novel backdoor attack surfaces that exploit agent collaboration, underscoring the need to move beyond single-agent protection. Code and benchmark are available at https://github.com/whfeLingYu/Distributed-Backdoor-Attacks-in-MAS.