🤖 AI Summary
Current personal AI agents rely heavily on semantic similarity for long-term memory retrieval, which introduces critical trustworthiness risks such as cross-domain leakage, sycophancy, tool-use misalignment, and memory-induced jailbreaking. To address this, this work proposes MemGate—a lightweight, task-conditioned neural gating mechanism (9M parameters, 35.1MB) that reframes memory retrieval as a trust-aware access control process. Inserted between the vector memory store and the large language model without modifying either component, MemGate enables the first task-intent-based memory filtering approach. Evaluated across mainstream memory frameworks (A-Mem, Mem0, MemOS) and the real-world agent environment OpenClaw, MemGate significantly mitigates memory-induced threats while preserving memory utility, demonstrating strong generalizability and practical deployability.
📝 Abstract
Personal AI agents increasingly rely on long-term memory to provide persistent personalization across sessions. However, existing memory pipelines are largely driven by semantic similarity: memory data close to the current query is retrieved and injected into the model context. This creates a critical trustworthiness gap, since a semantically related memory may still be contextually inappropriate, leading to threats such as cross-domain leakage, sycophancy, tool-call drift, or memory-induced jailbreaks.
In this paper, we study memory search as a trust boundary in personal AI agents. We evaluate representative agentic memory frameworks, including A-Mem, Mem0, and MemOS, together with OpenClaw, a real-world personal-agent environment with persistent state and tool-use capability. Our results show that long-term memory is not merely a utility layer, but a durable control channel that can reshape how agents interpret tasks and execute actions, leaving them highly susceptible to the aforementioned threats. To mitigate these vulnerabilities, we propose MemGate, a lightweight and deployable memory plug-in for trustworthy memory search, with only 9M parameters and a 35.1MB footprint. MemGate is inserted between the vector memory store and the backbone LLM, requiring no LLM modification, memory-database rewriting, or inference-time LLM judge. It applies a query-conditioned neural gate to candidate memory representations, turning raw similarity search into task-conditioned memory admission. Across multiple mainstream memory frameworks, real-world agent settings, and diverse LLM backbones, MemGate reduces memory-induced threats while preserving long-term memory utility.