🤖 AI Summary
Multispectral object detection aims to enhance robustness in complex environments by fusing visible (RGB) and infrared (IR) modalities. This paper proposes IC-Fusion, an IR-centric lightweight fusion framework that pioneers an IR-guided fusion paradigm. Our method introduces a multi-scale feature distillation module (MSFD) and a three-stage fusion module incorporating a cross-modal channel-swapping gate (CCSG) and a large-kernel gate (CLKG). Key innovations include wavelet-inspired modality-specific feature modeling, a lightweight RGB backbone, Transformer-style cross-modal gated fusion, and large-kernel attention mechanisms. Evaluated on the FLIR and LLVIP benchmarks, IC-Fusion achieves state-of-the-art detection accuracy while reducing model parameters and FLOPs significantly compared to prevailing methods. The source code is publicly available.
📝 Abstract
Multispectral object detection aims to leverage complementary information from visible (RGB) and infrared (IR) modalities to enable robust performance under diverse environmental conditions. In this letter, we propose IC-Fusion, a multispectral object detector that effectively fuses visible and infrared features through a lightweight and modalityaware design. Motivated by wavelet analysis and empirical observations, we find that IR images contain structurally rich high-frequency information critical for object localization, while RGB images provide complementary semantic context. To exploit this, we adopt a compact RGB backbone and design a novel fusion module comprising a Multi-Scale Feature Distillation (MSFD) block to enhance RGB features and a three-stage fusion block with Cross-Modal Channel Shuffle Gate (CCSG) and Cross-Modal Large Kernel Gate (CLKG) to facilitate effective cross-modal interaction. Experiments on the FLIR and LLVIP benchmarks demonstrate the effectiveness and efficiency of our IR-centric fusion strategy. Our code is available at https://github.com/smin-hwang/IC-Fusion.