EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from degraded recall and question-answering performance, along with reduced robustness, when processing ultra-long contexts (16k–256k tokens). To address this, we propose Episodic Memory Attention (EMA), a novel attention mechanism that encodes long documents into semantically retrievable memory units and dynamically reweights historical key-value (KV) caches during decoding based on semantic relevance. EMA preserves global contextual modeling while enhancing local focus efficiency. The EMA module is lightweight, seamlessly integrable into standard Transformer decoders, and supports end-to-end training and efficient inference. Evaluated on multi-task long-context retrieval and QA benchmarks, EMA significantly outperforms standard self-attention and state-of-the-art RAG methods. It demonstrates strong generalization and exhibits markedly improved performance stability as context length increases.

Technology Category

Application Category

📝 Abstract
Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce extbf{EpMAN} -- a method for processing long contexts in an extit{episodic memory} module while extit{holistically attending to} semantically relevant context chunks. The output of extit{episodic attention} is then used to reweigh the decoder's self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using extbf{EpMAN}, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks.
Problem

Research questions and friction points this paper is trying to address.

Long context processing challenge
Episodic memory for attention
Enhanced LLM performance benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Episodic Memory AttentioN
Holistically attending context
Reweigh decoder's self-attention
🔎 Similar Papers
No similar papers found.