Multispectral Detection Transformer with Infrared-Centric Sensor Fusion

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Multispectral object detection aims to enhance robustness in complex environments by fusing visible (RGB) and infrared (IR) modalities. This paper proposes IC-Fusion, an IR-centric lightweight fusion framework that pioneers an IR-guided fusion paradigm. Our method introduces a multi-scale feature distillation module (MSFD) and a three-stage fusion module incorporating a cross-modal channel-swapping gate (CCSG) and a large-kernel gate (CLKG). Key innovations include wavelet-inspired modality-specific feature modeling, a lightweight RGB backbone, Transformer-style cross-modal gated fusion, and large-kernel attention mechanisms. Evaluated on the FLIR and LLVIP benchmarks, IC-Fusion achieves state-of-the-art detection accuracy while reducing model parameters and FLOPs significantly compared to prevailing methods. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Multispectral object detection aims to leverage complementary information from visible (RGB) and infrared (IR) modalities to enable robust performance under diverse environmental conditions. In this letter, we propose IC-Fusion, a multispectral object detector that effectively fuses visible and infrared features through a lightweight and modalityaware design. Motivated by wavelet analysis and empirical observations, we find that IR images contain structurally rich high-frequency information critical for object localization, while RGB images provide complementary semantic context. To exploit this, we adopt a compact RGB backbone and design a novel fusion module comprising a Multi-Scale Feature Distillation (MSFD) block to enhance RGB features and a three-stage fusion block with Cross-Modal Channel Shuffle Gate (CCSG) and Cross-Modal Large Kernel Gate (CLKG) to facilitate effective cross-modal interaction. Experiments on the FLIR and LLVIP benchmarks demonstrate the effectiveness and efficiency of our IR-centric fusion strategy. Our code is available at https://github.com/smin-hwang/IC-Fusion.

Problem

Research questions and friction points this paper is trying to address.

Leveraging RGB and IR data for robust object detection

Enhancing RGB features with IR's structural information

Improving cross-modal fusion efficiency and effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight modality-aware fusion design

Multi-Scale Feature Distillation block

Cross-Modal Channel Shuffle Gate

🔎 Similar Papers

No similar papers found.