🤖 AI Summary
Commercial LLM agents—deployed with integrated memory, retrieval, web access, and API calling modules—introduce novel security and privacy vulnerabilities overlooked by conventional model-centric safety research.
Method: We propose a multi-dimensional attack taxonomy specifically designed for LLM agent architectures, identifying previously unrecognized entry points and observability-driven vulnerability patterns absent in standalone model settings. Our lightweight black-box injection and observation attacks require no model access or ML expertise, ensuring compatibility with both open-source and proprietary agent systems.
Contribution/Results: We successfully demonstrate these attacks across multiple mainstream agent platforms, achieving sensitive context exfiltration, execution flow hijacking, and malicious behavior induction. The experiments confirm the ubiquity and severity of these threats, establishing the first systematic analysis of agent-specific security risks and highlighting critical gaps in current LLM safety paradigms.
📝 Abstract
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs). These attacks may extract private information or coerce the model into producing harmful outputs. In real-world deployments, LLMs are often part of a larger agentic pipeline including memory systems, retrieval, web access, and API calling. Such additional components introduce vulnerabilities that make these LLM-powered agents much easier to attack than isolated LLMs, yet relatively little work focuses on the security of LLM agents. In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents. We first provide a taxonomy of attacks categorized by threat actors, objectives, entry points, attacker observability, attack strategies, and inherent vulnerabilities of agent pipelines. We then conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities. Notably, our attacks are trivial to implement and require no understanding of machine learning.