SAM-Enhanced Segmentation on Road Datasets: Balancing Critical Classes in Autonomous Driving

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the limitations of existing large-scale multimodal autonomous driving datasets—such as ZOD—which lack pixel-level semantic segmentation annotations and exhibit severe class imbalance, particularly with critically underrepresented categories like pedestrians and cyclists. To overcome these challenges, the authors propose an efficient automatic annotation pipeline based on the Segment Anything Model (SAM), generating the first large-scale, high-quality pixel-wise masks for ZOD. A human-verified subset of 2,300 frames is curated to ensure annotation reliability. Furthermore, a specialized segmentation model, CLFT-Hybrid, is introduced to tackle extreme class imbalance, achieving 48.1% mIoU on ZOD and 77.5% mIoU on the Iseauto platform, with notable improvements in rare-class performance and demonstrated cross-sensor representation transferability.
📝 Abstract
Dense semantic segmentation is essential for autonomous driving, yet many multi-modal datasets lack pixel-level annotations. The Zenseact Open Dataset (ZOD) provides rich multi-sensor data but only bounding-box labels, limiting its use for segmentation research. Our primary contribution is a Segment Anything Model (SAM)-based annotation pipeline that produces dense, pixel-level annotations for ZOD by converting bounding boxes into semantic masks. In this pilot study, we process over 100,000 frames and manually curate a 2,300-frame subset (36% acceptance rate) to establish a reliable baseline. Using these annotations, we evaluate transformer-based CLFT and CNN-based DeepLabV3+ architectures across diverse weather conditions, achieving up to 48.1% mIoU with CLFT-Hybrid. To address extreme class imbalance, where pedestrians, cyclists, and signs constitute less than 1% of pixels, we explore specialized models targeting rare classes. We further validate the pipeline on the Iseauto autonomous-vehicle platform, achieving 77.5% mIoU, and show that SAM-derived representations transfer effectively across sensor configurations via bidirectional transfer learning. All code and annotations are released to support reproducible research.
Problem

Research questions and friction points this paper is trying to address.

semantic segmentation
class imbalance
autonomous driving
pixel-level annotation
multi-modal datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

SAM-based annotation
class imbalance mitigation
cross-sensor transfer learning
autonomous driving segmentation
pixel-level labeling
🔎 Similar Papers
No similar papers found.