Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

📅 2025-12-14

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the quadratic computational complexity and numerical instability of Softmax attention in long-context language models, this paper proposes Error-Free Linear Attention (EFLA). Methodologically, EFLA models online attention updates as a continuous-time dynamical system and establishes, for the first time, the existence of an exact analytical solution; leveraging a rank-1 structure, it derives a closed-form formula equivalent to an infinite-order Runge–Kutta scheme, enabling zero error accumulation, O(L) time complexity, and fully parallelizable tensor computation. Theoretical contributions include rigorous guarantees of zero numerical error and high-fidelity modeling of long-range dependencies. Experiments demonstrate that EFLA consistently outperforms existing linear attention methods—achieving superior noise robustness, lower language modeling perplexity (exceeding DeltaNet), and stronger downstream task performance—without introducing any additional parameters.

Technology Category

Application Category

📝 Abstract

Linear-time attention and State Space Models (SSMs) promise to solve the quadratic cost bottleneck in long-context language models employing softmax attention. We introduce Error-Free Linear Attention (EFLA), a numerically stable, fully parallelism and generalized formulation of the delta rule. Specifically, we formulate the online learning update as a continuous-time dynamical system and prove that its exact solution is not only attainable but also computable in linear time with full parallelism. By leveraging the rank-1 structure of the dynamics matrix, we directly derive the exact closed-form solution effectively corresponding to the infinite-order Runge-Kutta method. This attention mechanism is theoretically free from error accumulation, perfectly capturing the continuous dynamics while preserving the linear-time complexity. Through an extensive suite of experiments, we show that EFLA enables robust performance in noisy environments, achieving lower language modeling perplexity and superior downstream benchmark performance than DeltaNet without introducing additional parameters. Our work provides a new theoretical foundation for building high-fidelity, scalable linear-time attention models.

Problem

Research questions and friction points this paper is trying to address.

Develops a numerically stable linear-time attention mechanism

Solves error accumulation in long-context language models

Enables exact solution from continuous-time dynamics efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous-time dynamics for exact linear attention solution

Rank-1 matrix structure enabling error-free closed-form solution

Linear-time fully parallel attention without error accumulation

🔎 Similar Papers

Attention layers provably solve single-location regression