From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This study addresses the critical security vulnerability of memory poisoning in persistent memory mechanisms of large language model (LLM) agents, where a single malicious write can exert long-term control over agent behavior. The work systematically identifies four memory write channels and nine structural vulnerabilities, proposes the first six-category attack taxonomy specifically targeting this threat, and introduces MPBench, a dedicated evaluation benchmark. Through adversarial input construction, system prompt analysis, and agent architecture auditing, the research demonstrates that agents actively reading from and writing to memory are significantly more susceptible to such attacks. Furthermore, it reveals that existing prompt injection defenses offer little to no protection against these memory-based exploits.
📝 Abstract
Memory is a core component of AI agents, enabling them to accumulate knowledge across interactions and improve performance. However, persistent memory introduces the risk of memory poisoning, where a single adversarial memory write can exert long-term influence over agent behavior. We present a systematic study of memory poisoning in LLM-based agents. We identify four memory write channels and nine structural vulnerabilities in model capabilities, system prompt design, and agent system architecture that make these channels exploitable. Based on these vulnerabilities, we develop a taxonomy of six classes of memory poisoning attacks. Furthermore, we design MPBench -- a benchmark for evaluating memory poisoning attacks, and show that agents designed to write and retrieve memory more aggressively are more exploitable. We also show that existing prompt injection defenses fail to cover memory poisoning attacks. Our findings provide a foundation for understanding and mitigating memory poisoning attacks against AI agents.
Problem

Research questions and friction points this paper is trying to address.

memory poisoning
LLM agents
adversarial attacks
trusted memory
AI security
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory poisoning
LLM agents
adversarial attacks
MPBench
systematic vulnerability analysis
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
P
Pritam Dash
Huawei Turing Research Center, Canada
T
Tongyu Ge
Huawei Turing Research Center, Canada
A
Aditi Jain
Huawei Turing Research Center, Canada
T
Tanmay Shah
University of Waterloo, Canada
Zhiwei Shang
Zhiwei Shang
The Chinese University of Hong Kong, Shenzhen
Robot LearningReinforcement Learning