🤖 AI Summary
Chiang and Cholak’s two-layer Transformer model requires dynamic parameter adaptation when input length varies, limiting its scalability and theoretical generality. Method: We propose the first three-layer constant-dimension Transformer whose architecture is entirely input-length-agnostic—achieving strict scale invariance both in parameter matrices (all weights fixed) and positional encodings (length-independent). Contribution/Results: Leveraging formal language theory, we constructively prove that this model exactly recognizes the Parity Language. Unlike prior two-layer models relying on length-dependent positional encodings, our design not only enhances expressivity via an additional layer but fundamentally decouples model parameters from input length. This yields the first fully scale-invariant Transformer architecture, marking a key advance in both theoretical language recognition capability and structural simplicity.
📝 Abstract
We construct a 3-layer constant-dimension transformer, recognizing the parity language, where neither parameter matrices nor the positional encoding depend on the input length. This improves upon a construction of Chiang and Cholak who use a positional encoding, depending on the input length (but their construction has 2 layers).