Softmax Transformers are Turing-Complete

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work resolves the open problem of Turing completeness for Softmax-based soft-attention Chain-of-Thought (CoT) Transformers. We construct a length-generalizable CoT architecture incorporating causal masking and relative positional encoding, and provide the first rigorous proof that Softmax CoT Transformers can simulate arbitrary Turing machines over finite unary alphabets and bounded languages—thereby establishing their Turing completeness. To bridge theory and practice, we introduce Counting RASP (C-RASP), a novel theoretical model that captures counting and arithmetic capabilities beyond standard RASP. Empirical evaluation confirms that the proposed architecture successfully models linguistic tasks requiring nonlinear arithmetic reasoning. Crucially, our results transcend the known limitations of hard-attention Transformers, delivering the first formal characterization of the computational power of soft-attention Transformers. This work thus provides foundational theoretical grounding for the expressive capacity of modern attention-based sequence models.

Technology Category

Application Category

📝 Abstract

Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for letter-bounded languages). While we show this is not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theory by training transformers for languages requiring complex (non-linear) arithmetic reasoning.

Problem

Research questions and friction points this paper is trying to address.

Proving Turing-completeness of softmax attention CoT transformers

Establishing length-generalizable transformers' computational equivalence

Validating theory with arithmetic reasoning language experiments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proving Turing-completeness for softmax CoT transformers

Using C-RASP extension with causal masking

Extending with relative positional encoding

🔎 Similar Papers

No similar papers found.