ZeroS: Zero-Sum Linear Attention for Efficient Transformers

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Although linear attention achieves O(N) complexity, its performance in long-context modeling is limited by the constraints of convex combinations and uniformly accumulated weight bias. This work proposes Zero-Sum Linear Attention (ZeroS), which removes the constant zeroth-order term and reweights the zero-sum softmax residual to transcend the conventional non-negativity constraint on attention weights. By enabling both positive and negative attention values, ZeroS endows a single attention layer with contrastive capabilities, substantially enhancing its expressive power. The method preserves O(N) computational complexity while expanding the set of representable functions, and it matches or even surpasses standard softmax attention across multiple sequence modeling benchmarks.

Technology Category

Application Category

📝 Abstract

Linear attention methods offer Transformers $O(N)$ complexity but typically underperform standard softmax attention. We identify two fundamental limitations affecting these approaches: the restriction to convex combinations that only permits additive information blending, and uniform accumulated weight bias that dilutes attention in long contexts. We propose Zero-Sum Linear Attention (ZeroS), which addresses these limitations by removing the constant zero-order term $1/t$ and reweighting the remaining zero-sum softmax residuals. This modification creates mathematically stable weights, enabling both positive and negative values and allowing a single attention layer to perform contrastive operations. While maintaining $O(N)$ complexity, ZeroS theoretically expands the set of representable functions compared to convex combinations. Empirically, it matches or exceeds standard softmax attention across various sequence modeling benchmarks.

Problem

Research questions and friction points this paper is trying to address.

linear attention

convex combinations

attention dilution

zero-sum

transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-Sum Linear Attention

O(N) Complexity

Contrastive Attention