Spectral-Aware Global Fusion for RGB-Thermal Semantic Segmentation

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of RGB semantic segmentation under low-light and occlusion conditions, this paper proposes SGFNet, a spectrum-aware global multimodal fusion network. The method explicitly decouples RGB and thermal infrared features from a spectral perspective—separating high-frequency components (e.g., edges and textures) from low-frequency contextual information—and models their cross-modal interactions. A high-frequency-guided global attention mechanism is further introduced to synergistically enhance both structural details and semantic context. SGFNet is end-to-end trainable and achieves state-of-the-art performance on the MFNet and PST900 benchmarks, with significant improvements in segmentation accuracy and robustness under challenging environmental conditions. Key contributions include: (i) the first spectral decomposition framework for multimodal feature disentanglement in semantic segmentation; (ii) explicit high-frequency interaction modeling across modalities; and (iii) a novel global attention fusion strategy that jointly optimizes contextual coherence and fine-grained detail preservation.

Technology Category

Application Category

📝 Abstract
Semantic segmentation relying solely on RGB data often struggles in challenging conditions such as low illumination and obscured views, limiting its reliability in critical applications like autonomous driving. To address this, integrating additional thermal radiation data with RGB images demonstrates enhanced performance and robustness. However, how to effectively reconcile the modality discrepancies and fuse the RGB and thermal features remains a well-known challenge. In this work, we address this challenge from a novel spectral perspective. We observe that the multi-modal features can be categorized into two spectral components: low-frequency features that provide broad scene context, including color variations and smooth areas, and high-frequency features that capture modality-specific details such as edges and textures. Inspired by this, we propose the Spectral-aware Global Fusion Network (SGFNet) to effectively enhance and fuse the multi-modal features by explicitly modeling the interactions between the high-frequency, modality-specific features. Our experimental results demonstrate that SGFNet outperforms the state-of-the-art methods on the MFNet and PST900 datasets.
Problem

Research questions and friction points this paper is trying to address.

RGB-only segmentation fails in low light and obscured views
Fusing RGB-thermal data improves robustness but is challenging
Proposing spectral-based fusion to handle modality discrepancies effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral-aware fusion of RGB and thermal features
Categorizes features into low and high-frequency components
Explicitly models interactions between modality-specific features
🔎 Similar Papers
No similar papers found.