🤖 AI Summary
To address the challenge of simultaneously achieving rotational equivariance and geometric fidelity in modeling spherical data (e.g., atmospheric dynamics, cosmic microwave background, robotic vision), this paper introduces the first equivariant Transformer architecture designed specifically for the spherical domain. Methodologically, it proposes: (1) a weighted attention mechanism grounded in spherical numerical integration, yielding approximate SO(3)-equivariance; (2) geodesic neighborhood attention, which incorporates local geometric priors to enhance generalization and scalability; and (3) custom CUDA-optimized kernels with memory-efficient implementation. Evaluated on spherical shallow-water equation simulation, spherical image segmentation, and spherical depth estimation, the model substantially outperforms planar Transformer baselines. Results demonstrate that explicit geometric priors—particularly rotational equivariance and geodesic locality—are critical for improving learning performance on spherical manifolds.
📝 Abstract
We introduce a generalized attention mechanism for spherical domains, enabling Transformer architectures to natively process data defined on the two-dimensional sphere - a critical need in fields such as atmospheric physics, cosmology, and robotics, where preserving spherical symmetries and topology is essential for physical accuracy. By integrating numerical quadrature weights into the attention mechanism, we obtain a geometrically faithful spherical attention that is approximately rotationally equivariant, providing strong inductive biases and leading to better performance than Cartesian approaches. To further enhance both scalability and model performance, we propose neighborhood attention on the sphere, which confines interactions to geodesic neighborhoods. This approach reduces computational complexity and introduces the additional inductive bias for locality, while retaining the symmetry properties of our method. We provide optimized CUDA kernels and memory-efficient implementations to ensure practical applicability. The method is validated on three diverse tasks: simulating shallow water equations on the rotating sphere, spherical image segmentation, and spherical depth estimation. Across all tasks, our spherical Transformers consistently outperform their planar counterparts, highlighting the advantage of geometric priors for learning on spherical domains.