DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion

📅 2025-08-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the trade-off between low geometric fidelity of grid-based representations and poor structural stability of graph-based representations in online high-definition (HD) map generation, this paper proposes DiffSemanticFusion—a novel framework that integrates semantic grid-based bird’s-eye view (BEV) encoding with diffusion-driven online HD map modeling. Specifically, it employs semantic grid BEV for learnable, vision-friendly scene encoding and introduces a lightweight map diffusion module to enhance robustness and representational capacity of graph-structured maps under sparse or noisy observations. The framework unifies multimodal trajectory prediction and end-to-end motion planning. Evaluated on nuScenes, it achieves a +5.1% mAP improvement; on NAVSIM’s NavHard benchmark, navigation success rate increases by 15%. It significantly outperforms existing methods while maintaining compatibility with mainstream autonomous driving simulation and real-vehicle platforms.

Technology Category

Application Category

📝 Abstract
Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural detail but become unstable without precise maps. To harness the complementary strengths of both, we propose DiffSemanticFusion -- a fusion framework for multimodal trajectory prediction and planning. Our approach reasons over a semantic raster-fused BEV space, enhanced by a map diffusion module that improves both the stability and expressiveness of online HD map representations. We validate our framework on two downstream tasks: trajectory prediction and planning-oriented end-to-end autonomous driving. Experiments on real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate improved performance over several state-of-the-art methods. For the prediction task on nuScenes, we integrate DiffSemanticFusion with the online HD map informed QCNet, achieving a 5.1% performance improvement. For end-to-end autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art results, with a 15% performance gain in NavHard scenarios. In addition, extensive ablation and sensitivity studies show that our map diffusion module can be seamlessly integrated into other vector-based approaches to enhance performance. All artifacts are available at https://github.com/SunZhigang7/DiffSemanticFusion.
Problem

Research questions and friction points this paper is trying to address.

Fuses raster and graph representations for autonomous driving
Improves stability and expressiveness of online HD maps
Enhances trajectory prediction and end-to-end driving performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic raster-fused BEV space for fusion
Map diffusion module enhances stability
Seamless integration with vector-based approaches
🔎 Similar Papers
No similar papers found.
Z
Zhigang Sun
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
Yiru Wang
Yiru Wang
University of Pittsburgh
Econometrics
A
Anqing Jiang
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
S
Shuo Wang
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
Y
Yu Gao
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
Y
Yuwen Heng
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
S
Shouyi Zhang
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
A
An He
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
H
Hao Jiang
Shanghai Jiaotong University, Shanghai, China
J
Jinhao Chai
School of Communication and Information Engineering, Shanghai University, Shanghai, China
Z
Zichong Gu
School of Communication and Information Engineering, Shanghai University, Shanghai, China
W
Wang Jijun
AIR, Tsinghua University, Beijing, China
S
Shichen Tang
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China
Lavdim Halilaj
Lavdim Halilaj
Corporate Research, Robert Bosch GmbH & Universum College
Neuro-Symbolic AIKnowledge EngineeringMulti-modal LearningGraph EmbeddingsData Integration
Juergen Luettin
Juergen Luettin
Robert Bosch GmbH
machine learningknowledge graphsautonomous drivinggraph neural networkscausality
H
Hao Sun
Bosch Corporate Research, Bosch (China) Investment Ltd., Shanghai, China