Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing LiDAR-camera 3D detectors suffer severe performance degradation under critical sensor failures—such as LiDAR beam dropouts, modality interruptions, or occlusions—common in autonomous driving. To address this, we propose MoME, a modality-decoupled LiDAR-camera 3D detector. Its core innovations are the Multi-Expert Decoder (MED) architecture and the Adaptive Query Router (AQR), which eliminate cross-modal dependencies and enable on-demand dynamic routing of queries to unimodal (camera-only or LiDAR-only) or multimodal expert paths. MoME comprises three parallel expert decoders—camera, LiDAR, and fusion—jointly optimizing robustness against sensor failures and detection accuracy. Evaluated on the nuScenes-R benchmark, MoME achieves state-of-the-art performance, notably maintaining superior accuracy under extreme weather conditions and diverse sensor failure scenarios, including simultaneous LiDAR and camera degradations.

Technology Category

Application Category

📝 Abstract

Modern autonomous driving perception systems utilize complementary multi-modal sensors, such as LiDAR and cameras. Although sensor fusion architectures enhance performance in challenging environments, they still suffer significant performance drops under severe sensor failures, such as LiDAR beam reduction, LiDAR drop, limited field of view, camera drop, and occlusion. This limitation stems from inter-modality dependencies in current sensor fusion frameworks. In this study, we introduce an efficient and robust LiDAR-camera 3D object detector, referred to as MoME, which can achieve robust performance through a mixture of experts approach. Our MoME fully decouples modality dependencies using three parallel expert decoders, which use camera features, LiDAR features, or a combination of both to decode object queries, respectively. We propose Multi-Expert Decoding (MED) framework, where each query is decoded selectively using one of three expert decoders. MoME utilizes an Adaptive Query Router (AQR) to select the most appropriate expert decoder for each query based on the quality of camera and LiDAR features. This ensures that each query is processed by the best-suited expert, resulting in robust performance across diverse sensor failure scenarios. We evaluated the performance of MoME on the nuScenes-R benchmark. Our MoME achieved state-of-the-art performance in extreme weather and sensor failure conditions, significantly outperforming the existing models across various sensor failure scenarios.

Problem

Research questions and friction points this paper is trying to address.

Resilient sensor fusion under severe failures

Decoupling modality dependencies in fusion frameworks

Robust 3D object detection in adverse conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of experts for robust fusion

Three parallel expert decoders

Adaptive Query Router for expert selection

🔎 Similar Papers

FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion