GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-sensor fusion is critical for performance and robustness in end-to-end autonomous driving, yet existing attention-based flat fusion or BEV geometric fusion approaches suffer from poor interpretability and high computational overhead. This paper proposes a Gaussian-distribution-based multimodal fusion framework: it employs physically parameterized 2D Gaussian units as compact, interpretable intermediate representations to unify heterogeneous sensor inputs (e.g., camera and LiDAR). The framework introduces geometric-aware initialization, explicit-implicit dual-stream feature fusion, cross-modal alignment, and a cascaded Gaussian-trajectory interaction planning head. Evaluated on NAVSIM and Bench2Drive benchmarks, the method achieves significant improvements in trajectory prediction accuracy and robustness—particularly under challenging conditions such as occlusion and low-light scenarios—while maintaining computational efficiency and model transparency.

Technology Category

Application Category

📝 Abstract
Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion, a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving. Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors. Specifically, we initialize a set of 2D Gaussians uniformly across the driving scene, where each Gaussian is parameterized by physical attributes and equipped with explicit and implicit features. These Gaussians are progressively refined by integrating multi-modal features. The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues beneficial for trajectory planning. To fully exploit rich spatial and semantic information in Gaussians, we design a cascade planning head that iteratively refines trajectory predictions through interactions with Gaussians. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate the effectiveness and robustness of the proposed GaussianFusion framework. The source code will be released at https://github.com/Say2L/GaussianFusion.
Problem

Research questions and friction points this paper is trying to address.

Improves multi-sensor fusion for autonomous driving systems
Addresses interpretability and computational overhead in fusion methods
Enhances trajectory planning with Gaussian-based spatial and semantic features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian representations for sensor fusion
Integrates explicit and implicit Gaussian features
Employs cascade planning head for trajectory refinement
🔎 Similar Papers
No similar papers found.
S
Shuai Liu
School of Computer Science and Engineering, Sun Yat-sen University
Quanmin Liang
Quanmin Liang
Sun Yat-Sen University
MultimodalEmbodied AI
Z
Zefeng Li
School of Computer Science and Engineering, Sun Yat-sen University
B
Boyang Li
School of Computer Science and Engineering, Sun Yat-sen University
K
Kai Huang
School of Computer Science and Engineering, Sun Yat-sen University