π€ AI Summary
This work addresses the vulnerability of web-based agents that rely on multimodal external memory to persistent poisoning attacks, wherein malicious content can be repeatedly retrieved and exploited to manipulate agent behavior. The paper introduces MemVenom, the first black-box attack framework targeting graph-structured multimodal memory. MemVenom employs a two-stage strategy: it first ensures high-probability retrieval of poisoned memories through trigger-conditioned querying, then overlays adversarial image perturbations with stealthy OCR-based text injections to override the userβs original intent. Notably, the method requires no model parameter modifications, is task-agnostic, and exhibits strong cross-architecture transferability. Experiments demonstrate that MemVenom achieves up to 99.15% end-to-end attack success rates across diverse web agents and vision-language models while minimally impacting normal task performance.
π Abstract
External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeatedly influence agent behavior. In this work, we identify and systematically study multimodal memory poisoning, an overlooked yet practical attack surface in web-agent systems. We propose MemVenom, a unified black-box attack framework that poisons graph-structured external memory with coordinated text-image evidence. Our method consists of a two-stage design: (1) a trigger-conditioned retrieval attack that ensures high-probability recall of malicious memory, and (2) a post-retrieval attack induction that leverages adversarial perturbations and stealthy OCR injection to override the original user objective. Unlike prior attacks that operate on prompts or text-only memory, our approach enables persistent, reusable, and goal-agnostic attacks without modifying model parameters or re-optimizing malicious tasks. Experiments across multiple web-agent frameworks and vision-language models demonstrate that MemVenom achieves strong end-to-end attack success with minimal impact on benign performance, reaching up to 99.15% on GPT-5-family web agents, while transferring effectively across architectures and model scales.