MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cloud contamination in optical satellite imagery degrades the accuracy of environmental monitoring and land cover classification. To address this, we propose a multispectral, multi-sensor cloud segmentation network that jointly leverages Sentinel-2 and Landsat-8 data to achieve fine-grained segmentation into four classes: clear sky, thin cloud, thick cloud, and cloud shadow. Our method introduces a novel cross-modal cross-attention mechanism for efficient alignment of heterogeneous features, integrated with a Swin Transformer backbone, ASPP/PSP multi-scale modules, and a dual channel-spatial attention structure—enhancing scale adaptability and discriminative capability in complex cloud regions. Evaluated on CloudSEN12 and L8Biome benchmarks, our model achieves state-of-the-art performance while significantly reducing parameter count and computational cost compared to existing approaches, demonstrating strong potential for large-scale remote sensing applications.

Technology Category

Application Category

📝 Abstract
Clouds remain a critical challenge in optical satellite imagery, hindering reliable analysis for environmental monitoring, land cover mapping, and climate research. To overcome this, we propose MSCloudCAM, a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation. Our framework exploits the spectral richness of Sentinel-2 (CloudSEN12) and Landsat-8 (L8Biome) data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow. MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning. A Cross-Attention block enables effective multisensor and multispectral feature fusion, while the integration of an Efficient Channel Attention Block (ECAB) and a Spatial Attention Module adaptively refine feature representations. Comprehensive experiments on CloudSEN12 and L8Biome demonstrate that MSCloudCAM delivers state-of-the-art segmentation accuracy, surpassing leading baseline architectures while maintaining competitive parameter efficiency and FLOPs. These results underscore the model's effectiveness and practicality, making it well-suited for large-scale Earth observation tasks and real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Segmenting clouds in multispectral satellite imagery accurately
Overcoming cloud obstruction for environmental monitoring applications
Fusing multi-sensor data for improved cloud classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Swin Transformer backbone for feature extraction
Integrates ASPP and PSP for multi-scale context
Employs Cross-Attention block for multispectral feature fusion
🔎 Similar Papers
No similar papers found.
M
Md Abdullah Al Mazid
School of Computing and Information Sciences, Florida International University
L
Liangdong Deng
School of Computing and Information Sciences, Florida International University
Naphtali Rishe
Naphtali Rishe
Professor of Computer Science and the inaugural Outstanding Professor of FIU
geospatial databasessemantic databasesGeo and Health Big Data