Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

Prior work on backdoor attacks against large language model (LLM)-driven multi-agent systems (MAS) remains unexplored, particularly in collaborative settings. Method: This paper identifies a novel attack surface arising from agent coordination and introduces the concept of “distributed backdoors”: malicious primitives are stealthily embedded across multiple tools and activated only upon execution of specific inter-agent collaboration sequences—enabling covert data exfiltration while evading conventional single-agent defenses. Our approach comprises distributed primitive embedding, role-aware task orchestration, and a sandboxed evaluation framework, validated on a newly constructed multi-role collaborative benchmark. Contribution/Results: Experiments demonstrate >95% attack success rate with negligible impact on benign task performance, confirming the attack’s high stealthiness and effectiveness—thereby exposing critical security vulnerabilities in LLM-based MAS architectures.

Technology Category

Application Category

📝 Abstract

LLM-based multi-agent systems (MAS) demonstrate increasing integration into next-generation applications, but their safety in backdoor attacks remains largely underexplored. However, existing research has focused exclusively on single-agent backdoor attacks, overlooking the novel attack surfaces introduced by agent collaboration in MAS. To bridge this gap, we present the first Distributed Backdoor Attack tailored to MAS. We decompose the backdoor into multiple distributed attack primitives that are embedded within MAS tools. These primitives remain dormant individually but collectively activate only when agents collaborate in a specific sequence, thereby assembling the full backdoor to execute targeted attacks such as data exfiltration. To fully assess this threat, we introduce a benchmark for multi-role collaborative tasks and a sandboxed framework to evaluate. Extensive experiments demonstrate that our attack achieves an attack success rate exceeding 95% without degrading performance on benign tasks. This work exposes novel backdoor attack surfaces that exploit agent collaboration, underscoring the need to move beyond single-agent protection. Code and benchmark are available at https://github.com/whfeLingYu/Distributed-Backdoor-Attacks-in-MAS.

Problem

Research questions and friction points this paper is trying to address.

Addressing underexplored backdoor attack risks in multi-agent systems

Overcoming limitations of single-agent focused security research

Mitigating distributed attack primitives activated through agent collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed backdoor attack for multi-agent systems

Dormant primitives activate via agent collaboration

Achieves 95% attack success without performance degradation

🔎 Similar Papers

Secret Collusion Among Generative AI Agents