Fixed Universal Transformers

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This work investigates the source of expressive power in Transformer models and introduces a parameter-fixed universal Transformer architecture capable of simulating any specified Transformer by encoding its structural description directly into the input embeddings. The authors present the first explicit construction of a sparse universal Transformer and demonstrate that a randomly initialized Transformer is, with high probability, universal—highlighting the pivotal role of input representation rather than trainable parameters in determining model expressivity. Theoretical analyses are empirically validated on bracket matching and multi-hop reasoning tasks, confirming that the model’s capabilities primarily stem from how inputs are encoded, not from learned weights.

📝 Abstract

We introduce \emph{universal transformers}: fixed transformers that can simulate any transformer in a given class via a suitable input embedding. Analogous to a universal Turing machine, the input embedding encodes a description of the target model while all internal parameters remain fixed. We provide explicit sparse constructions achieving universality when the embedding dimension is sufficiently large, and further show that universality is generic: randomly initialized transformers are universal almost surely, which aligns with recent empirical results of Zhong and Andreas (2024). We empirically validate our theory on the algorithmic tasks of parenthesis balancing and multi-hop reasoning. Our results suggest that much of a transformer's expressive power may reside in its input representation rather than its learned weights.

Problem

Research questions and friction points this paper is trying to address.

universal transformers

fixed architecture

input embedding

model simulation

expressive power

Innovation

Methods, ideas, or system contributions that make the work stand out.

universal transformers

fixed architecture

input embedding