Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work reveals a systemic failure of prevailing defenses against indirect prompt injection (IPI) in large language model (LLM) agents under adaptive adversarial settings. To address this, we introduce the first systematic adaptive attack framework specifically designed for evaluating IPI defenses—integrating multi-round feedback optimization, dynamic context-aware perturbation, and reverse engineering of defense behaviors—to consistently bypass all eight mainstream defenses (average success rate >50%). Our core contributions are threefold: (1) a novel evaluation paradigm for IPI defenses grounded in adversarial robustness, emphasizing the necessity of adaptive attack testing; (2) rigorous empirical validation demonstrating that static or non-adaptive evaluations significantly overestimate defense efficacy; and (3) open-sourcing of our attack implementation to foster community-wide adoption of more stringent, realistic evaluation standards for LLM agent security.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM) agents exhibit remarkable performance across diverse applications by using external tools to interact with environments. However, integrating external tools introduces security risks, such as indirect prompt injection (IPI) attacks. Despite defenses designed for IPI attacks, their robustness remains questionable due to insufficient testing against adaptive attacks. In this paper, we evaluate eight different defenses and bypass all of them using adaptive attacks, consistently achieving an attack success rate of over 50%. This reveals critical vulnerabilities in current defenses. Our research underscores the need for adaptive attack evaluation when designing defenses to ensure robustness and reliability. The code is available at https://github.com/uiuc-kang-lab/AdaptiveAttackAgent.

Problem

Research questions and friction points this paper is trying to address.

Evaluates robustness of defenses against indirect prompt injection attacks.

Demonstrates vulnerabilities in current defenses using adaptive attacks.

Highlights need for adaptive attack evaluation in defense design.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates eight defenses against indirect prompt injection

Uses adaptive attacks to bypass existing defenses

Achieves over 50% attack success rate consistently

🔎 Similar Papers

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models