ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Standard absolute and relative position encodings in Transformers suffer from poor extrapolation to sequences significantly longer than those seen during training, leading to degraded generalization. To address this, we propose Exact Position Encoding (ExPE), which directly embeds precise positional values into designated dimensions of token representations—enabling lossless, high-fidelity positional modeling without approximation. ExPE is the first absolute position encoding scheme to achieve reliable length extrapolation in generative Transformer models while fully preserving original semantic representations. Empirical results demonstrate that ExPE substantially reduces perplexity on sequences far exceeding training lengths—outperforming both RoPE and sinusoidal encoding—with markedly improved length generalization and stability.

Technology Category

Application Category

📝 Abstract

This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it was trained on. Traditional transformer models rely on absolute or relative position embeddings to incorporate positional information into token embeddings, which often struggle with extrapolation to sequences longer than those seen during training. Our proposed method utilizes a novel embedding strategy that encodes exact positional information by overriding specific dimensions of the embedding vectors, thereby enabling a more precise representation of token positions. The proposed approach not only maintains the integrity of the original embeddings but also enhances the model's ability to generalize to more extended sequences. In causal language modeling, our ExPE embeddings significantly reduce perplexity compared to rotary and sinusoidal embeddings, when tested on sequences longer than those used in training.

Problem

Research questions and friction points this paper is trying to address.

Addresses positional embedding limitations in transformers for long sequences

Enables extrapolation to sequences longer than training data

Improves positional accuracy while maintaining embedding integrity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exact Positional Embeddings override specific embedding dimensions

Method enables precise representation of token positions

Approach maintains embedding integrity for better extrapolation

🔎 Similar Papers

No similar papers found.