A completely uniform transformer for parity

📅 2025-01-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Chiang and Cholak’s two-layer Transformer model requires dynamic parameter adaptation when input length varies, limiting its scalability and theoretical generality. Method: We propose the first three-layer constant-dimension Transformer whose architecture is entirely input-length-agnostic—achieving strict scale invariance both in parameter matrices (all weights fixed) and positional encodings (length-independent). Contribution/Results: Leveraging formal language theory, we constructively prove that this model exactly recognizes the Parity Language. Unlike prior two-layer models relying on length-dependent positional encodings, our design not only enhances expressivity via an additional layer but fundamentally decouples model parameters from input length. This yields the first fully scale-invariant Transformer architecture, marking a key advance in both theoretical language recognition capability and structural simplicity.

Technology Category

Application Category

📝 Abstract

We construct a 3-layer constant-dimension transformer, recognizing the parity language, where neither parameter matrices nor the positional encoding depend on the input length. This improves upon a construction of Chiang and Cholak who use a positional encoding, depending on the input length (but their construction has 2 layers).

Problem

Research questions and friction points this paper is trying to address.

language recognition

machine optimization

input length variability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-layered Structure

Consistent Information Transmission

Efficiency and Stability

🔎 Similar Papers

No similar papers found.

Authors to Follow