Multispectral Detection Transformer with Infrared-Centric Sensor Fusion

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multispectral object detection aims to enhance robustness in complex environments by fusing visible (RGB) and infrared (IR) modalities. This paper proposes IC-Fusion, an IR-centric lightweight fusion framework that pioneers an IR-guided fusion paradigm. Our method introduces a multi-scale feature distillation module (MSFD) and a three-stage fusion module incorporating a cross-modal channel-swapping gate (CCSG) and a large-kernel gate (CLKG). Key innovations include wavelet-inspired modality-specific feature modeling, a lightweight RGB backbone, Transformer-style cross-modal gated fusion, and large-kernel attention mechanisms. Evaluated on the FLIR and LLVIP benchmarks, IC-Fusion achieves state-of-the-art detection accuracy while reducing model parameters and FLOPs significantly compared to prevailing methods. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Multispectral object detection aims to leverage complementary information from visible (RGB) and infrared (IR) modalities to enable robust performance under diverse environmental conditions. In this letter, we propose IC-Fusion, a multispectral object detector that effectively fuses visible and infrared features through a lightweight and modalityaware design. Motivated by wavelet analysis and empirical observations, we find that IR images contain structurally rich high-frequency information critical for object localization, while RGB images provide complementary semantic context. To exploit this, we adopt a compact RGB backbone and design a novel fusion module comprising a Multi-Scale Feature Distillation (MSFD) block to enhance RGB features and a three-stage fusion block with Cross-Modal Channel Shuffle Gate (CCSG) and Cross-Modal Large Kernel Gate (CLKG) to facilitate effective cross-modal interaction. Experiments on the FLIR and LLVIP benchmarks demonstrate the effectiveness and efficiency of our IR-centric fusion strategy. Our code is available at https://github.com/smin-hwang/IC-Fusion.
Problem

Research questions and friction points this paper is trying to address.

Leveraging RGB and IR data for robust object detection
Enhancing RGB features with IR's structural information
Improving cross-modal fusion efficiency and effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight modality-aware fusion design
Multi-Scale Feature Distillation block
Cross-Modal Channel Shuffle Gate
🔎 Similar Papers
No similar papers found.
S
Seongmin Hwang
Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
D
Daeyoung Han
School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
Moongu Jeon
Moongu Jeon
Gwangju Institute of Science and Technology
Artificial intelligenceMachine learningComputer visionAutonomous driving