CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

📅 2026-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of large language models to continuously evolving adversarial attacks in multi-turn interactions, a challenge exacerbated by the limited dynamic adaptability of existing defense mechanisms. To overcome this limitation, we propose the first state-memory-based multi-agent collaborative defense framework, wherein a System Agent dynamically orchestrates three specialized agents—Deferring, Tempting, and Forensic—to jointly maintain and update a shared defensive state, enabling proactive and sustained responses to multi-round attacks. We also introduce EMRA, a novel benchmark for evaluating multi-turn adversarial attacks. Experimental results demonstrate that our approach significantly outperforms state-of-the-art defenses on EMRA, reducing attack success rates by 78.9%, increasing deception detection rates by 186%, and decreasing attack efficiency by 167.9%.
📝 Abstract
As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety concerns, especially those evolving over multi-round interactions. Existing defenses are largely reactive and struggle to adapt as adversaries refine strategies across rounds. In this work, we propose CoopGuard , a stateful multi-round LLM defense framework based on cooperative agents that maintains and updates an internal defense state to counter evolving attacks. It employs three specialized agents (Deferring Agent, Tempting Agent, and Forensic Agent) for complementary round-level strategies, coordinated by System Agent, which conditions decisions on the evolving defense state (interaction history) and orchestrates agents over time. To evaluate evolving threats, we introduce the EMRA benchmark with 5,200 adversarial samples across 8 attack types, simulating progressively LLM multi-round attacks. Experiments show that CoopGuard reduces attack success rate by 78.9% over state-of-the-art defenses, while improving deceptive rate by 186% and reducing attack efficiency by 167.9%, offering a more comprehensive assessment of multi-round defense. These results demonstrate that CoopGuard provides robust protection for LLMs in multi-round adversarial scenarios.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Adversarial Attacks
Multi-Round Interactions
Evolving Threats
LLM Safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

stateful defense
cooperative agents
multi-round adversarial attacks
LLM safety
adaptive security
🔎 Similar Papers
No similar papers found.
Siyuan Li
Siyuan Li
Shanghai Jiao Tong University
Trustworthy LLM AgentsEdge Intelligence
Z
Zehao Liu
School of Computer Science, Shanghai Jiao Tong University
X
Xi Lin
School of Computer Science, Shanghai Jiao Tong University
Qinghua Mao
Qinghua Mao
Shanghai Jiao Tong University
Graph Neural NetworksTrustworthy AILarge Language ModelsRetrieval Augmented Generation
Yuliang Chen
Yuliang Chen
University of California, San Diego
Self-Supervised LearningMultimodal Learning
Haoyu Li
Haoyu Li
Student, UIUC
Machine Learning
J
Jun Wu
School of Computer Science, Shanghai Jiao Tong University
J
Jianhua Li
School of Computer Science, Shanghai Jiao Tong University
X
Xiu Su
Big Data Institute, Central South University