Q-Delta: Beyond Key-Value Associative State Evolution

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key limitation in existing linear attention mechanisms, which neglect the role of queries in memory state evolution and use them solely for readout, thereby constraining model expressivity. We propose Q-Delta, a novel approach that, for the first time, incorporates query information directly into the state evolution process. By leveraging query-aware prediction errors to drive joint key-value memory updates, Q-Delta enhances model expressiveness while preserving the computational efficiency of the Delta rule. We provide theoretical guarantees for stability and introduce a hardware-friendly block-wise parallel implementation. Empirical results demonstrate that Q-Delta achieves consistently superior performance over strong baselines in language modeling and long-context retrieval tasks, with high training stability and throughput.
📝 Abstract
Linear attention reformulates sequence modeling as recurrent state evolution, enabling efficient linear-time inference. Under the key-value associative paradigm, existing approaches restrict the role of the query to the readout operation, decoupling it from state evolution. We show that query-conditioned state readout induces a structured value prediction over accumulated memory that complements key-based retrieval. Based on this insight, we propose Q-Delta, a query-aware delta rule that integrates mixed key-query prediction errors into state evolution, enabling jointly corrective dynamics while preserving delta-rule efficiency. We establish stability guarantees for the resulting dynamics and derive a hardware-efficient chunkwise-parallel formulation with a custom Triton implementation. Empirical results demonstrate stable optimization, competitive throughput, and consistent improvements over strong baselines on language modeling and long-context retrieval tasks.
Problem

Research questions and friction points this paper is trying to address.

linear attention
state evolution
query conditioning
associative memory
sequence modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-Delta
linear attention
query-aware state evolution
delta rule
chunkwise parallelism
🔎 Similar Papers
No similar papers found.