🤖 AI Summary
To address the challenges of online high-definition (HD) map construction—namely, heavy reliance on costly 3D annotations and limited generalizability and scalability—this paper proposes a weakly supervised NeRF-guided self-training framework. Methodologically, it leverages onboard 2D semantic image labels and introduces a Map-to-Ray Matching strategy to achieve cross-modal geometric-semantic alignment; iteratively refined, view-consistent 3D pseudo-labels mitigate error accumulation during self-training. Innovatively integrating Neural Radiance Fields (NeRF) modeling with self-training, the framework jointly reconstructs 3D geometry and semantics using only 2D supervision. Evaluated on Argoverse 2 and nuScenes, it achieves approximately 75% of the performance of fully supervised methods and significantly outperforms existing weakly supervised approaches. This work is the first to empirically validate the feasibility and effectiveness of online HD mapping driven solely by 2D labels.
📝 Abstract
Autonomous driving systems benefit from high-definition (HD) maps that provide critical information about road infrastructure. The online construction of HD maps offers a scalable approach to generate local maps from on-board sensors. However, existing methods typically rely on costly 3D map annotations for training, which limits their generalization and scalability across diverse driving environments. In this work, we propose MapRF, a weakly supervised framework that learns to construct 3D maps using only 2D image labels. To generate high-quality pseudo labels, we introduce a novel Neural Radiance Fields (NeRF) module conditioned on map predictions, which reconstructs view-consistent 3D geometry and semantics. These pseudo labels are then iteratively used to refine the map network in a self-training manner, enabling progressive improvement without additional supervision. Furthermore, to mitigate error accumulation during self-training, we propose a Map-to-Ray Matching strategy that aligns map predictions with camera rays derived from 2D labels. Extensive experiments on the Argoverse 2 and nuScenes datasets demonstrate that MapRF achieves performance comparable to fully supervised methods, attaining around 75% of the baseline while surpassing several approaches using only 2D labels. This highlights the potential of MapRF to enable scalable and cost-effective online HD map construction for autonomous driving.