🤖 AI Summary
This work addresses the safety-critical issue of high-confidence false detections on out-of-distribution (OOD) objects in LiDAR-based 3D object detection by introducing, for the first time, a vision-language model (VLM) to LiDAR point cloud OOD detection. By establishing a cross-modal alignment mechanism, the method maps object features extracted from LiDAR point clouds into a language embedding space, thereby reformulating OOD detection as a zero-shot classification task. Notably, this approach requires no OOD data during training and significantly enhances the model’s capability to recognize unknown categories. Evaluated on the nuScenes OOD benchmark, it achieves competitive performance, demonstrating the effectiveness and potential of leveraging linguistic priors for LiDAR-based OOD detection.
📝 Abstract
LiDAR-based 3D object detection plays a critical role for reliable and safe autonomous driving systems. However, existing detectors often produce overly confident predictions for objects not belonging to known categories, posing significant safety risks. This is caused by so-called out-of-distribution (OOD) objects, which were not part of the training data, resulting in incorrect predictions. To address this challenge, we propose ALOOD (Aligned LiDAR representations for Out-Of-Distribution Detection), a novel approach that incorporates language representations from a vision-language model (VLM). By aligning the object features from the object detector to the feature space of the VLM, we can treat the detection of OOD objects as a zero-shot classification task. We demonstrate competitive performance on the nuScenes OOD benchmark, establishing a novel approach to OOD object detection in LiDAR using language representations. The source code is available at https://github.com/uulm-mrm/mmood3d.