🤖 AI Summary
Repetitive structural patterns in construction sites—such as uniform wall surfaces and similar apartment layouts—induce perceptual aliasing in LiDAR-based localization, severely hindering autonomous measurement and inspection by mobile robots. To address this, we propose a LiDAR-only global relocalization method that introduces diffusion models into LiDAR place recognition for the first time. Our approach jointly leverages PointNet++ to model multimodal positional distributions, enabling end-to-end mapping from a single LiDAR scan to a global position probability distribution. The diffusion model is trained on synthetically generated LiDAR data augmented with large-scale realistic building mesh simulations. Evaluated on five real-world datasets, our method achieves 77% place recognition accuracy (within ±2 m) and reduces mean localization error by 50% compared to state-of-the-art baselines, demonstrating significantly improved robustness and generalization in complex indoor construction environments.
📝 Abstract
Mobile robots on construction sites require accurate pose estimation to perform autonomous surveying and inspection missions. Localization in construction sites is a particularly challenging problem due to the presence of repetitive features such as flat plastered walls and perceptual aliasing due to apartments with similar layouts inter and intra floors. In this paper, we focus on the global re-positioning of a robot with respect to an accurate scanned mesh of the building solely using LiDAR data. In our approach, a neural network is trained on synthetic LiDAR point clouds generated by simulating a LiDAR in an accurate real-life large-scale mesh. We train a diffusion model with a PointNet++ backbone, which allows us to model multiple position candidates from a single LiDAR point cloud. The resulting model can successfully predict the global position of LiDAR in confined and complex sites despite the adverse effects of perceptual aliasing. The learned distribution of potential global positions can provide multi-modal position distribution. We evaluate our approach across five real-world datasets and show the place recognition accuracy of 77% +/-2m on average while outperforming baselines at a factor of 2 in mean error.