🤖 AI Summary
This work addresses the limitations of conventional sequence models in effectively capturing geometric and physical relationships due to challenges in structural generalization, computational efficiency, and interpretability. The authors propose Versor, a novel architecture that integrates conformal geometric algebra (CGA) into sequence modeling for the first time. By embedding states on the Cl_{4,1} manifold and leveraging versors to enact SE(3)-equivariant geometric evolution, Versor obviates the need for explicit structural encodings. The approach natively supports linear computational complexity, zero-shot extrapolation across scales, and highly interpretable attention mechanisms. Experiments demonstrate that Versor achieves significantly superior performance over Transformers and GATr on N-body dynamics, topological reasoning, and multimodal tasks, using 200× fewer parameters. Notably, it attains a 99.3% mean correlation coefficient (MCC) on zero-shot topological tasks and accelerates Clifford kernel computations by up to 78×.
📝 Abstract
A novel sequence architecture design is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of the traditional fundamental non-linear operations to achieve structural generalization and significant performance improvements on a variety of tasks, while offering improved interpretability and efficiency. By embedding states in the $Cl_{4,1}$ manifold and evolving them via geometric transformations (rotors), Versor natively represents $SE(3)$-equivariant relationships without requiring explicit structural encoding. Versor is validated on chaotic N-body dynamics, topological reasoning, and standard multimodal benchmarks (CIFAR-10, WikiText-103), consistently outperforming Transformers, Graph Networks, and geometric baselines (GATr, EGNN). Key results include: orders of magnitude fewer parameters ($200\times$ vs. Transformers); interpretable attention decomposing into proximity and orientational components; zero-shot scale generalization (99.3% MCC on topology vs. 50.4% for ViT); and $O(L)$ linear complexity via the novel Recursive Rotor Accumulator. In out-of-distribution tests, Versor maintains stable predictions while Transformers fail catastrophically. Custom Clifford kernels achieve up to $78\times$ speedup, providing a scalable foundation for geometrically-aware scientific modeling.