M+: Extending MemoryLLM with Scalable Long-Term Memory

📅 2025-02-01

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address the limitations of large language models—namely, long-term memory decay and constrained latent-space capacity, which restrict knowledge retention to short contexts (e.g., MemoryLLM supports only ~20k tokens)—this paper proposes the M+ architecture. Building upon MemoryLLM, M+ introduces a hierarchical memory-retrieval coordination mechanism that integrates latent-space memory compression, multi-tiered memory pool management, and an end-to-end jointly trained lightweight retriever. This design overcomes the implicit state capacity bottleneck while incurring negligible additional GPU memory overhead, extending effective knowledge retention to over 160k tokens. Experiments demonstrate that M+ significantly outperforms MemoryLLM and state-of-the-art baselines on long-context understanding and knowledge retention benchmarks, enabling dynamic, precise, and stable ultra-long-range information retrieval.

Technology Category

Application Category

📝 Abstract

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths up to 16k tokens, it struggles to retain knowledge beyond 20k tokens. In this work, we address this limitation by introducing M+, a memory-augmented model based on MemoryLLM that significantly enhances long-term information retention. M+ integrates a long-term memory mechanism with a co-trained retriever, dynamically retrieving relevant information during text generation. We evaluate M+ on diverse benchmarks, including long-context understanding and knowledge retention tasks. Experimental results show that M+ significantly outperforms MemoryLLM and recent strong baselines, extending knowledge retention from under 20k to over 160k tokens with similar GPU memory overhead.

Problem

Research questions and friction points this paper is trying to address.

Long-term Memory

Language Models

Memory Enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

M+

long-term memory capabilities

information retrieval efficiency

🔎 Similar Papers

Memory Mosaics