ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard absolute and relative position encodings in Transformers suffer from poor extrapolation to sequences significantly longer than those seen during training, leading to degraded generalization. To address this, we propose Exact Position Encoding (ExPE), which directly embeds precise positional values into designated dimensions of token representations—enabling lossless, high-fidelity positional modeling without approximation. ExPE is the first absolute position encoding scheme to achieve reliable length extrapolation in generative Transformer models while fully preserving original semantic representations. Empirical results demonstrate that ExPE substantially reduces perplexity on sequences far exceeding training lengths—outperforming both RoPE and sinusoidal encoding—with markedly improved length generalization and stability.

Technology Category

Application Category

📝 Abstract
This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it was trained on. Traditional transformer models rely on absolute or relative position embeddings to incorporate positional information into token embeddings, which often struggle with extrapolation to sequences longer than those seen during training. Our proposed method utilizes a novel embedding strategy that encodes exact positional information by overriding specific dimensions of the embedding vectors, thereby enabling a more precise representation of token positions. The proposed approach not only maintains the integrity of the original embeddings but also enhances the model's ability to generalize to more extended sequences. In causal language modeling, our ExPE embeddings significantly reduce perplexity compared to rotary and sinusoidal embeddings, when tested on sequences longer than those used in training.
Problem

Research questions and friction points this paper is trying to address.

Addresses positional embedding limitations in transformers for long sequences
Enables extrapolation to sequences longer than training data
Improves positional accuracy while maintaining embedding integrity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exact Positional Embeddings override specific embedding dimensions
Method enables precise representation of token positions
Approach maintains embedding integrity for better extrapolation
🔎 Similar Papers
No similar papers found.
A
Aleksis Datseris
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Sofia, Bulgaria
S
Sylvia Vassileva
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Sofia, Bulgaria
I
Ivan Koychev
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Sofia, Bulgaria
Svetla Boytcheva
Svetla Boytcheva
Ontotext
Artificial IntelligenceComputational LinguisticsMedical InformaticsMachine Learning