DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the trade-off between low geometric fidelity of grid-based representations and poor structural stability of graph-based representations in online high-definition (HD) map generation, this paper proposes DiffSemanticFusion—a novel framework that integrates semantic grid-based bird’s-eye view (BEV) encoding with diffusion-driven online HD map modeling. Specifically, it employs semantic grid BEV for learnable, vision-friendly scene encoding and introduces a lightweight map diffusion module to enhance robustness and representational capacity of graph-structured maps under sparse or noisy observations. The framework unifies multimodal trajectory prediction and end-to-end motion planning. Evaluated on nuScenes, it achieves a +5.1% mAP improvement; on NAVSIM’s NavHard benchmark, navigation success rate increases by 15%. It significantly outperforms existing methods while maintaining compatibility with mainstream autonomous driving simulation and real-vehicle platforms.

Technology Category

Application Category

📝 Abstract

Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural detail but become unstable without precise maps. To harness the complementary strengths of both, we propose DiffSemanticFusion -- a fusion framework for multimodal trajectory prediction and planning. Our approach reasons over a semantic raster-fused BEV space, enhanced by a map diffusion module that improves both the stability and expressiveness of online HD map representations. We validate our framework on two downstream tasks: trajectory prediction and planning-oriented end-to-end autonomous driving. Experiments on real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate improved performance over several state-of-the-art methods. For the prediction task on nuScenes, we integrate DiffSemanticFusion with the online HD map informed QCNet, achieving a 5.1% performance improvement. For end-to-end autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art results, with a 15% performance gain in NavHard scenarios. In addition, extensive ablation and sensitivity studies show that our map diffusion module can be seamlessly integrated into other vector-based approaches to enhance performance. All artifacts are available at https://github.com/SunZhigang7/DiffSemanticFusion.

Problem

Research questions and friction points this paper is trying to address.

Fuses raster and graph representations for autonomous driving

Improves stability and expressiveness of online HD maps

Enhances trajectory prediction and end-to-end driving performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic raster-fused BEV space for fusion

Map diffusion module enhances stability

Seamless integration with vector-based approaches

🔎 Similar Papers

Online Temporal Fusion for Vectorized Map Construction in Mapless Autonomous Driving