🤖 AI Summary
Point cloud registration (PCR) faces a trade-off between modeling capacity and computational efficiency: Transformers achieve strong long-range dependency modeling but suffer from quadratic complexity, limiting scalability to high-resolution point clouds; conversely, Mamba offers linear complexity and inherent long-range modeling yet struggles with the unordered, irregular structure of point clouds. To bridge this gap, we propose the first Mamba–Transformer hybrid framework. Our method introduces a Z-order spatial serialization scheme that explicitly preserves geometric locality and eliminates Mamba’s sensitivity to input ordering via an order-agnostic module. Further, we adopt a hierarchical architecture: a Mamba backbone efficiently encodes global context, while a lightweight Transformer performs fine-grained refinement, synergistically integrating state-space modeling (SSM) and self-attention. Evaluated on multiple benchmarks, our approach surpasses state-of-the-art methods in registration accuracy while reducing GPU memory consumption by 38% and FLOPs by 52%, achieving superior accuracy–efficiency trade-offs.
📝 Abstract
Point cloud registration (PCR) is a fundamental task in 3D computer vision and robotics. Most existing learning-based PCR methods rely on Transformers, which suffer from quadratic computational complexity. This limitation restricts the resolution of point clouds that can be processed, inevitably leading to information loss. In contrast, Mamba-a recently proposed model based on state space models (SSMs)-achieves linear computational complexity while maintaining strong long-range contextual modeling capabilities. However, directly applying Mamba to PCR tasks yields suboptimal performance due to the unordered and irregular nature of point cloud data. To address this challenge, we propose MT-PCR, the first point cloud registration framework that integrates both Mamba and Transformer modules. Specifically, we serialize point cloud features using Z-order space-filling curves to enforce spatial locality, enabling Mamba to better model the geometric structure of the input. Additionally, we remove the order indicator module commonly used in Mamba-based sequence modeling, leads to improved performance in our setting. The serialized features are then processed by an optimized Mamba encoder, followed by a Transformer refinement stage. Extensive experiments on multiple benchmarks demonstrate that MT-PCR outperforms Transformer-based and concurrent state-of-the-art methods in both accuracy and efficiency, significantly reducing while GPU memory usage and FLOPs.