UniLiPs: Unified LiDAR Pseudo-Labeling with Geometry-Grounded Dynamic Scene Decomposition

๐Ÿ“… 2026-01-08
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of leveraging vast unlabeled LiDAR data in autonomous driving while mitigating the high cost of manual annotation by proposing an unsupervised multimodal pseudo-labeling framework. By exploiting temporal geometric consistency across consecutive LiDAR scans, the method elevates and fuses semantic cues from 2D vision foundation models and textual prompts into 3D space. It introduces a geometry-prior-driven dynamic scene decomposition and iterative refinement mechanism, enabling joint 3D semantic segmentation, object detection, and point cloud densification within a unified framework. High-quality pseudo labels are generated through geometricโ€“semantic consistency constraints, leveraging temporally accumulated LiDAR maps as geometric priors combined with multimodal prompts. Evaluated on three datasets, the approach demonstrates strong generalization: without any human supervision, it reduces the mean absolute error (MAE) of depth prediction by 51.5% and 22.0% in the 80โ€“150 m and 150โ€“250 m ranges, respectively, using only sparse, geometrically consistent dense points.

Technology Category

Application Category

๐Ÿ“ Abstract
Unlabeled LiDAR logs, in autonomous driving applications, are inherently a gold mine of dense 3D geometry hiding in plain sight - yet they are almost useless without human labels, highlighting a dominant cost barrier for autonomous-perception research. In this work we tackle this bottleneck by leveraging temporal-geometric consistency across LiDAR sweeps to lift and fuse cues from text and 2D vision foundation models directly into 3D, without any manual input. We introduce an unsupervised multi-modal pseudo-labeling method relying on strong geometric priors learned from temporally accumulated LiDAR maps, alongside with a novel iterative update rule that enforces joint geometric-semantic consistency, and vice-versa detecting moving objects from inconsistencies. Our method simultaneously produces 3D semantic labels, 3D bounding boxes, and dense LiDAR scans, demonstrating robust generalization across three datasets. We experimentally validate that our method compares favorably to existing semantic segmentation and object detection pseudo-labeling methods, which often require additional manual supervision. We confirm that even a small fraction of our geometrically consistent, densified LiDAR improves depth prediction by 51.5% and 22.0% MAE in the 80-150 and 150-250 meters range, respectively.
Problem

Research questions and friction points this paper is trying to address.

LiDAR
pseudo-labeling
autonomous driving
3D semantic segmentation
unlabeled data
Innovation

Methods, ideas, or system contributions that make the work stand out.

pseudo-labeling
geometry-semantic consistency
LiDAR densification
unsupervised 3D perception
dynamic scene decomposition
๐Ÿ”Ž Similar Papers
No similar papers found.