Hybrid Associative Memories

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work proposes a Hybrid Associative Memory (HAM) layer to address the limitations of traditional recurrent neural networks (RNNs) in capturing long-range dependencies due to their compressed history representation, and the high computational and memory costs of self-attention mechanisms, which scale quadratically with sequence length. HAM integrates RNNs with self-attention by using the RNN to model the overall sequence while dynamically activating self-attention only at critical positions where the RNN’s predictions are uncertain, explicitly storing key-value (KV) pairs into a cache. A single, learnable threshold governs this data-driven caching strategy, enabling significant reductions in KV cache usage without compromising performance. Empirical results demonstrate that HAM achieves an efficient balance between computational efficiency and expressive power, matching or surpassing the performance of pure RNNs or Transformers.

Technology Category

Application Category

📝 Abstract

Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory is constructed using two orthogonal mechanisms: RNNs compress the entire past into a fixed-size state, whereas self-attention's state stores every past time step growing its state (the KV cache) linearly with the sequence length. This results in orthogonal strengths and weaknesses. Self-attention layers excel at retrieving information in the context but have large memory and computational costs, while RNNs are more efficient but degrade over longer contexts and underperform for precise recall tasks. Prior work combining these mechanisms has focused primarily on naively interleaving them to reduce computational cost without regard to their complementary mechanisms. We propose the Hybrid Associative Memory (HAM) layer, which combines self-attention and RNNs while leveraging their individual strengths: the RNN compresses the entire sequence, while attention supplements it *only* with information that is difficult for the RNN to predict, which is hence the most valuable information to explicitly store. HAM layers enable data-dependent growth of the KV cache, which can be precisely controlled by the user with a single, continuous threshold. We find that this fine-grained control of the KV cache growth rate has a smooth trade-off with loss and performance. Empirically, we show that our hybrid architecture offers strong, competitive performance relative to RNNs and Transformers even at substantially lower KV-cache usage.

Problem

Research questions and friction points this paper is trying to address.

recurrent neural networks

self-attention

memory efficiency

sequence modeling

KV cache

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Associative Memory

RNN-Attention Integration

KV Cache Control