Fixed Universal Transformers

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

151K/year
🤖 AI Summary
This work investigates the source of expressive power in Transformer models and introduces a parameter-fixed universal Transformer architecture capable of simulating any specified Transformer by encoding its structural description directly into the input embeddings. The authors present the first explicit construction of a sparse universal Transformer and demonstrate that a randomly initialized Transformer is, with high probability, universal—highlighting the pivotal role of input representation rather than trainable parameters in determining model expressivity. Theoretical analyses are empirically validated on bracket matching and multi-hop reasoning tasks, confirming that the model’s capabilities primarily stem from how inputs are encoded, not from learned weights.
📝 Abstract
We introduce \emph{universal transformers}: fixed transformers that can simulate any transformer in a given class via a suitable input embedding. Analogous to a universal Turing machine, the input embedding encodes a description of the target model while all internal parameters remain fixed. We provide explicit sparse constructions achieving universality when the embedding dimension is sufficiently large, and further show that universality is generic: randomly initialized transformers are universal almost surely, which aligns with recent empirical results of Zhong and Andreas (2024). We empirically validate our theory on the algorithmic tasks of parenthesis balancing and multi-hop reasoning. Our results suggest that much of a transformer's expressive power may reside in its input representation rather than its learned weights.
Problem

Research questions and friction points this paper is trying to address.

universal transformers
fixed architecture
input embedding
model simulation
expressive power
Innovation

Methods, ideas, or system contributions that make the work stand out.

universal transformers
fixed architecture
input embedding
expressive power
random initialization
🔎 Similar Papers
No similar papers found.