🤖 AI Summary
This work investigates how LiDAR-to-image 2D projection schemes affect metric-scale place recognition performance when leveraging vision foundation models. To address the confounding of projection design with backbone architecture effects in prior methods, we propose a modular retrieval framework that explicitly decouples the projection module from the backbone network and feature aggregation strategy—enabling, for the first time, a systematic evaluation of the pure geometric contribution of projection to discriminability, environmental robustness, and real-time efficiency. Using a unified geometry-structure dual-channel representation across multiple datasets and state-of-the-art vision foundation models, we empirically validate projection design efficacy in real-world scenarios. Experiments show that the optimal 2D projection significantly improves robustness under illumination and seasonal variations, achieves 3.2× faster inference than 3D end-to-end approaches, and attains 97.6% of their accuracy—demonstrating that carefully engineered 2D projections serve as an efficient and practical alternative to full 3D learning.
📝 Abstract
This work presents a systematic investigation into how alternative LiDAR-to-image projections affect metric place recognition when coupled with a state-of-the-art vision foundation model. We introduce a modular retrieval pipeline that controls for backbone, aggregation, and evaluation protocol, thereby isolating the influence of the 2-D projection itself. Using consistent geometric and structural channels across multiple datasets and deployment scenarios, we identify the projection characteristics that most strongly determine discriminative power, robustness to environmental variation, and suitability for real-time autonomy. Experiments with different datasets, including integration into an operational place recognition policy, validate the practical relevance of these findings and demonstrate that carefully designed projections can serve as an effective surrogate for end-to-end 3-D learning in LiDAR place recognition.