🤖 AI Summary
Point cloud registration in remote sensing and digital heritage 3D reconstruction suffers from inadequate directional modeling and uncertainty quantification, leading to sensitivity to noise and orthogonal rotations and heavy reliance on transformation-augmented training data. To address this, we propose a surfel-based SE(3)-equivariant pose regression framework: virtual cameras initialize surfels; an SE(3)-equivariant convolutional encoder jointly extracts spatial and orientational features; cross-attention matching and Huber loss jointly optimize pose estimation. This work is the first to introduce SE(3) equivariance into surfel representations, explicitly encoding joint pose transformation invariance. Evaluated on diverse indoor and outdoor real-world datasets, our method achieves state-of-the-art performance—delivering higher registration accuracy and significantly improved robustness against noise and large rotations—while substantially reducing dependence on data augmentation.
📝 Abstract
Point cloud registration is crucial for ensuring 3D alignment consistency of multiple local point clouds in 3D reconstruction for remote sensing or digital heritage. While various point cloud-based registration methods exist, both non-learning and learning-based, they ignore point orientations and point uncertainties, making the model susceptible to noisy input and aggressive rotations of the input point cloud like orthogonal transformation; thus, it necessitates extensive training point clouds with transformation augmentations. To address these issues, we propose a novel surfel-based pose learning regression approach. Our method can initialize surfels from Lidar point cloud using virtual perspective camera parameters, and learns explicit $mathbf{SE(3)}$ equivariant features, including both position and rotation through $mathbf{SE(3)}$ equivariant convolutional kernels to predict relative transformation between source and target scans. The model comprises an equivariant convolutional encoder, a cross-attention mechanism for similarity computation, a fully-connected decoder, and a non-linear Huber loss. Experimental results on indoor and outdoor datasets demonstrate our model superiority and robust performance on real point-cloud scans compared to state-of-the-art methods.