Associative Recurrent Memory Transformer

📅 2024-07-05

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the high incremental inference complexity inherent in modeling ultra-long sequences. We propose a constant-time (O(1)) incremental inference architecture for Transformer-based models. Methodologically, we introduce, for the first time, a segment-level recurrence mechanism into the Transformer’s memory structure, integrating local self-attention with a differentiable associative retrieval module and a task-aware long-context chunking caching strategy—enabling distributed storage and efficient retrieval of long-range information. Our core contribution lies in breaking the linear or quadratic computational bottlenecks of standard Transformers: each inference step incurs fixed computational cost while preserving robust long-range dependency modeling. Evaluated on the BABILong benchmark (50M tokens) for single-fact question answering, our method achieves 79.9% accuracy—substantially outperforming existing long-context foundation models.

Technology Category

Application Category

📝 Abstract

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets a new performance record in the recent BABILong multi-task long-context benchmark by answering single-fact questions over 50 million tokens with an accuracy of 79.9%. The source code for training and evaluation is available on github.

Problem

Research questions and friction points this paper is trying to address.

Neural architecture for long sequences

Constant time processing per step

Associative retrieval task performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer self-attention

Segment-level recurrence

Associative retrieval tasks

🔎 Similar Papers

Memory Mosaics