Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work on backdoor attacks against large language model (LLM)-driven multi-agent systems (MAS) remains unexplored, particularly in collaborative settings. Method: This paper identifies a novel attack surface arising from agent coordination and introduces the concept of “distributed backdoors”: malicious primitives are stealthily embedded across multiple tools and activated only upon execution of specific inter-agent collaboration sequences—enabling covert data exfiltration while evading conventional single-agent defenses. Our approach comprises distributed primitive embedding, role-aware task orchestration, and a sandboxed evaluation framework, validated on a newly constructed multi-role collaborative benchmark. Contribution/Results: Experiments demonstrate >95% attack success rate with negligible impact on benign task performance, confirming the attack’s high stealthiness and effectiveness—thereby exposing critical security vulnerabilities in LLM-based MAS architectures.

Technology Category

Application Category

📝 Abstract
LLM-based multi-agent systems (MAS) demonstrate increasing integration into next-generation applications, but their safety in backdoor attacks remains largely underexplored. However, existing research has focused exclusively on single-agent backdoor attacks, overlooking the novel attack surfaces introduced by agent collaboration in MAS. To bridge this gap, we present the first Distributed Backdoor Attack tailored to MAS. We decompose the backdoor into multiple distributed attack primitives that are embedded within MAS tools. These primitives remain dormant individually but collectively activate only when agents collaborate in a specific sequence, thereby assembling the full backdoor to execute targeted attacks such as data exfiltration. To fully assess this threat, we introduce a benchmark for multi-role collaborative tasks and a sandboxed framework to evaluate. Extensive experiments demonstrate that our attack achieves an attack success rate exceeding 95% without degrading performance on benign tasks. This work exposes novel backdoor attack surfaces that exploit agent collaboration, underscoring the need to move beyond single-agent protection. Code and benchmark are available at https://github.com/whfeLingYu/Distributed-Backdoor-Attacks-in-MAS.
Problem

Research questions and friction points this paper is trying to address.

Addressing underexplored backdoor attack risks in multi-agent systems
Overcoming limitations of single-agent focused security research
Mitigating distributed attack primitives activated through agent collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed backdoor attack for multi-agent systems
Dormant primitives activate via agent collaboration
Achieves 95% attack success without performance degradation
🔎 Similar Papers
2024-02-12arXiv.orgCitations: 11
Pengyu Zhu
Pengyu Zhu
North China Electric Power University
Artificial IntelligenceBrain-Computer InterfaceAI for SciencePattern Recognition
L
Lijun Li
Shanghai Artificial Intelligence Laboratory
Y
Yaxing Lyu
Xiamen University Malaysia
L
Li Sun
North China Electric Power University
S
Sen Su
Beijing University of Posts and Telecommunications
Jing Shao
Jing Shao
Research Scientist, Shanghai AI Laboratory/Shanghai Jiao Tong University
Computer VisionMulti-Modal Large Language Model