Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

📅 2024-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RGB-T salient object detection methods face two key bottlenecks: (1) Transformer-based architectures incur excessive computational overhead, hindering efficient high-resolution dual-modality fusion; and (2) insufficient frequency-domain modeling capability leads to high-frequency detail misalignment between predictions and ground truth. To address these, we propose DFENet—a lightweight, purely Fourier-driven network featuring the first fully frequency-domain backbone, eliminating all spatial-domain convolutions and self-attention mechanisms. Our core innovations include: modality-cooperative perception attention, frequency-decomposition edge-aware module, Fourier residual channel attention, and a cofocal frequency loss. All operations—including feature decomposition, enhancement, and cross-modal alignment—are performed efficiently via Fast Fourier Transform (FFT). DFENet achieves state-of-the-art performance across four major RGB-T benchmarks, significantly improving both accuracy and inference efficiency on high-resolution inputs. The code is publicly available.

Technology Category

Application Category

📝 Abstract
The rapid development of deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images. However, existing deep learning-based RGB-T SOD models suffer from two major limitations. First, Transformer-based models with quadratic complexity are computationally expensive and memory-intensive, limiting their application in high-resolution bi-modal feature fusion. Second, even when these models converge to an optimal solution, there remains a frequency gap between the prediction and ground-truth. To overcome these limitations, we propose a purely Fourier transform-based model, namely Deep Fourier-Embedded Network (DFENet), for accurate RGB-T SOD. To address the computational complexity when dealing with high-resolution images, we leverage the efficiency of fast Fourier transform with linear complexity to design three key components: (1) the Modal-coordinated Perception Attention, which fuses RGB and thermal modalities with enhanced multi-dimensional representation; (2) the Frequency-decomposed Edge-aware Block, which clarifies object edges by deeply decomposing and enhancing frequency components of low-level features; and (3) the Fourier Residual Channel Attention Block, which prioritizes high-frequency information while aligning channel-wise global relationships. To mitigate the frequency gap, we propose Co-focus Frequency Loss, which dynamically weights hard frequencies during edge frequency reconstruction by cross-referencing bi-modal edge information in the Fourier domain. Extensive experiments on four RGB-T SOD benchmark datasets demonstrate that DFENet outperforms fifteen existing state-of-the-art RGB-T SOD models. Comprehensive ablation studies further validate the value and effectiveness of our newly proposed components. The code is available at https://github.com/JoshuaLPF/DFENet.
Problem

Research questions and friction points this paper is trying to address.

Overcome computational complexity in RGB-T SOD.
Address frequency gap in prediction accuracy.
Enhance multi-modal feature fusion efficiency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Fourier transform for RGB-T object detection
Implements Modal-coordinated Perception Attention for fusion
Introduces Co-focus Frequency Loss for edge reconstruction
🔎 Similar Papers
No similar papers found.
Pengfei Lyu
Pengfei Lyu
Ph.D student at Northeastern University
Machine LearningComputer visionMulti-modal image processing
X
Xiaosheng Yu
Faculty of Robot Science and Engineering, Northeastern University, Shenyang, 110169 China
C
Chengdong Wu
Faculty of Robot Science and Engineering, Northeastern University, Shenyang, 110169 China
J
Jagath C. Rajapakse
School of Computer Science and Engineering, Nanyang Technological University, 639798 Singapore