Beyond Similarity: Trustworthy Memory Search for Personal AI Agents

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Current personal AI agents rely heavily on semantic similarity for long-term memory retrieval, which introduces critical trustworthiness risks such as cross-domain leakage, sycophancy, tool-use misalignment, and memory-induced jailbreaking. To address this, this work proposes MemGate—a lightweight, task-conditioned neural gating mechanism (9M parameters, 35.1MB) that reframes memory retrieval as a trust-aware access control process. Inserted between the vector memory store and the large language model without modifying either component, MemGate enables the first task-intent-based memory filtering approach. Evaluated across mainstream memory frameworks (A-Mem, Mem0, MemOS) and the real-world agent environment OpenClaw, MemGate significantly mitigates memory-induced threats while preserving memory utility, demonstrating strong generalizability and practical deployability.

📝 Abstract

Personal AI agents increasingly rely on long-term memory to provide persistent personalization across sessions. However, existing memory pipelines are largely driven by semantic similarity: memory data close to the current query is retrieved and injected into the model context. This creates a critical trustworthiness gap, since a semantically related memory may still be contextually inappropriate, leading to threats such as cross-domain leakage, sycophancy, tool-call drift, or memory-induced jailbreaks. In this paper, we study memory search as a trust boundary in personal AI agents. We evaluate representative agentic memory frameworks, including A-Mem, Mem0, and MemOS, together with OpenClaw, a real-world personal-agent environment with persistent state and tool-use capability. Our results show that long-term memory is not merely a utility layer, but a durable control channel that can reshape how agents interpret tasks and execute actions, leaving them highly susceptible to the aforementioned threats. To mitigate these vulnerabilities, we propose MemGate, a lightweight and deployable memory plug-in for trustworthy memory search, with only 9M parameters and a 35.1MB footprint. MemGate is inserted between the vector memory store and the backbone LLM, requiring no LLM modification, memory-database rewriting, or inference-time LLM judge. It applies a query-conditioned neural gate to candidate memory representations, turning raw similarity search into task-conditioned memory admission. Across multiple mainstream memory frameworks, real-world agent settings, and diverse LLM backbones, MemGate reduces memory-induced threats while preserving long-term memory utility.

Problem

Research questions and friction points this paper is trying to address.

trustworthiness

memory search

personal AI agents

semantic similarity

contextual appropriateness

Innovation

Methods, ideas, or system contributions that make the work stand out.

trustworthy memory search

MemGate

personal AI agents