Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous driving multimodal fusion models suffer from a decision-making black-box problem, hindering quantitative assessment of individual modality contributions (e.g., camera, radar, LiDAR) across network layers. To address this, we propose the first model-agnostic, post-hoc, and layer-wise modality attribution method. Our approach leverages layer-wise modality decomposition coupled with structured perturbation analysis to enable interpretable, architecture-agnostic fusion diagnostics. It supports diverse input configurations—including camera-radar, camera-LiDAR, and tri-modal fusion—while preserving the performance of high-capacity models. The method delivers precise, per-layer quantification of modality contributions alongside intuitive visualizations. Extensive experiments demonstrate its effectiveness and generalizability across multiple fusion paradigms and benchmarks. Code is publicly available.

Technology Category

Application Category

📝 Abstract
In autonomous driving, transparency in the decision-making of perception models is critical, as even a single misperception can be catastrophic. Yet with multi-sensor inputs, it is difficult to determine how each modality contributes to a prediction because sensor information becomes entangled within the fusion network. We introduce Layer-Wise Modality Decomposition (LMD), a post-hoc, model-agnostic interpretability method that disentangles modality-specific information across all layers of a pretrained fusion model. To our knowledge, LMD is the first approach to attribute the predictions of a perception model to individual input modalities in a sensor-fusion system for autonomous driving. We evaluate LMD on pretrained fusion models under camera-radar, camera-LiDAR, and camera-radar-LiDAR settings for autonomous driving. Its effectiveness is validated using structured perturbation-based metrics and modality-wise visual decompositions, demonstrating practical applicability to interpreting high-capacity multimodal architectures. Code is available at https://github.com/detxter-jvb/Layer-Wise-Modality-Decomposition.
Problem

Research questions and friction points this paper is trying to address.

Disentangling modality contributions in multimodal sensor fusion models
Providing interpretability for autonomous driving perception systems
Attributing predictions to individual sensor modalities post-hoc
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-Wise Modality Decomposition disentangles modality-specific information
Method is post-hoc and model-agnostic for pretrained fusion models
Attributes predictions to individual input modalities in sensor-fusion systems
🔎 Similar Papers
No similar papers found.