RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Although conventional Rotary Position Embedding (RoPE) endows attention scores with relative positional awareness, the value pathway remains position-agnostic. This work proposes RoVE, a method that imparts relative positional sensitivity to the value pathway without introducing additional parameters by synchronously rotating value and key vectors. The authors further show that this operation is equivalent to an attention-based convolution. RoVE represents the first approach to incorporate relative positional information into the value pathway, thereby unifying the theoretical foundations of diverse attention variants across computer vision, robotics, and large language models. Experiments demonstrate that RoVE consistently outperforms RoPE on few-shot learning, out-of-distribution perplexity, and long-context retrieval tasks in GPT-2 models of 124M and 354M parameters, with particularly notable gains in modeling long-range dependencies.

📝 Abstract

Rotary Position Embeddings (RoPE) make attention scores position-relative but leave the value pathway position-blind: the message sent by a value token is the same regardless of its distance from the query. We propose RoVE, a parameter-free modification that makes values position-sensitive by rotating them simultaneously with keys, and show that it turns RoPE attention into attentive convolution. This new perspective unifies several independent formulations of the same operation across computer vision, robotics, and modern LLM architectures. Trained 124M and 354M GPT-2 models show consistent empirical gains over RoPE on few-shot in-context learning, out-of-distribution perplexity, and long-context retrieval, with the clearest improvements on tasks that require long-range aggregation.

Problem

Research questions and friction points this paper is trying to address.

Rotary Position Embeddings

attention mechanism

relative position

value pathway

position sensitivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Position Embeddings

Value Pathway

Attentive Convolution