🤖 AI Summary
This work addresses the lack of a unified theoretical framework for rotary position encoding (RoPE) in high-dimensional settings, where existing methods struggle to model cross-dimensional interactions and exhibit directional bias. The paper proposes nD-RoPE, a decomposition-free rotary position encoding scheme applicable to arbitrary dimensions, establishing the first theoretical foundation for high-dimensional RoPE. Built upon a translation-invariant continuous Hilbert space, nD-RoPE couples position and frequency into an n-dimensional vector and introduces a multi-scale positive simplex wavevector design. This yields isotropic, non-degenerate spatial coverage and directionally balanced second-order responses. Empirical evaluations demonstrate consistent performance gains across image, video, and point cloud tasks, significantly enhancing model generalization in high-dimensional scenarios.
📝 Abstract
Rotary Position Embedding (RoPE) is widely adopted in Transformer models, yet its extension to high-dimensional domains lacks a unified theoretical formulation. Most existing approaches either apply rotations independently along each axis or empirically mix frequencies, which limits cross-dimensional interactions and yields direction-dependent representations. To address these limitations, we propose nD-RoPE, a decomposition-free generalization of RoPE to arbitrary dimensions. From a translation-invariant formulation in continuous Hilbert space, we derive a spectral condition for isotropy that requires treating positions and frequencies as coupled \(n\)-dimensional vectors. We instantiate this formulation with a multi-scale regular-simplex wave-vector design, which provides non-degenerate spatial coverage and a symmetric, directionally balanced second-order response. Experiments across images, videos, and point clouds demonstrate consistent performance gains and improved generalization in high-dimensional settings.