A completely uniform transformer for parity

📅 2025-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chiang and Cholak’s two-layer Transformer model requires dynamic parameter adaptation when input length varies, limiting its scalability and theoretical generality. Method: We propose the first three-layer constant-dimension Transformer whose architecture is entirely input-length-agnostic—achieving strict scale invariance both in parameter matrices (all weights fixed) and positional encodings (length-independent). Contribution/Results: Leveraging formal language theory, we constructively prove that this model exactly recognizes the Parity Language. Unlike prior two-layer models relying on length-dependent positional encodings, our design not only enhances expressivity via an additional layer but fundamentally decouples model parameters from input length. This yields the first fully scale-invariant Transformer architecture, marking a key advance in both theoretical language recognition capability and structural simplicity.

Technology Category

Application Category

📝 Abstract
We construct a 3-layer constant-dimension transformer, recognizing the parity language, where neither parameter matrices nor the positional encoding depend on the input length. This improves upon a construction of Chiang and Cholak who use a positional encoding, depending on the input length (but their construction has 2 layers).
Problem

Research questions and friction points this paper is trying to address.

language recognition
machine optimization
input length variability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-layered Structure
Consistent Information Transmission
Efficiency and Stability
🔎 Similar Papers
No similar papers found.
A
A. Kozachinskiy
Centro Nacional de Inteligencia Artificial, Chile
Tomasz Steifer
Tomasz Steifer
Polish Academy of Sciences
machine learning & AI theory