🤖 AI Summary
To address the challenges of costly, time-consuming, and labor-intensive acquisition and annotation of real-world LiDAR data for semantic segmentation, this paper introduces SynthmanticLiDAR—the first synthetic dataset deeply customized for LiDAR semantic segmentation. We present the first semantic segmentation–oriented modification of the CARLA simulator: introducing 16 LiDAR-compatible semantic classes, standardizing annotation protocols, enabling controllable class distribution generation, and developing a high-fidelity LiDAR sensor model coupled with a per-point semantic labeling pipeline. Leveraging domain adaptation from the synthetic source domain (SynthmanticLiDAR) to real-world target domains (e.g., SemanticKITTI), we achieve mIoU improvements of 3.2–5.7 percentage points on state-of-the-art models including SalsaNext and Cylinder3D. Both the dataset and the simulation toolkit are publicly released.
📝 Abstract
Semantic segmentation on LiDAR imaging is increasingly gaining attention, as it can provide useful knowledge for perception systems and potential for autonomous driving. However, collecting and labeling real LiDAR data is an expensive and time-consuming task. While datasets such as SemanticKITTI have been manually collected and labeled, the introduction of simulation tools such as CARLA, has enabled the creation of synthetic datasets on demand. In this work, we present a modified CARLA simulator designed with LiDAR semantic segmentation in mind, with new classes, more consistent object labeling with their counterparts from real datasets such as SemanticKITTI, and the possibility to adjust the object class distribution. Using this tool, we have generated SynthmanticLiDAR, a synthetic dataset for semantic segmentation on LiDAR imaging, designed to be similar to SemanticKITTI, and we evaluate its contribution to the training process of different semantic segmentation algorithms by using a naive transfer learning approach. Our results show that incorporating SynthmanticLiDAR into the training process improves the overall performance of tested algorithms, proving the usefulness of our dataset, and therefore, our adapted CARLA simulator. The dataset and simulator are available in https://github.com/vpulab/SynthmanticLiDAR.