🤖 AI Summary
Cloud contamination in optical satellite imagery degrades the accuracy of environmental monitoring and land cover classification. To address this, we propose a multispectral, multi-sensor cloud segmentation network that jointly leverages Sentinel-2 and Landsat-8 data to achieve fine-grained segmentation into four classes: clear sky, thin cloud, thick cloud, and cloud shadow. Our method introduces a novel cross-modal cross-attention mechanism for efficient alignment of heterogeneous features, integrated with a Swin Transformer backbone, ASPP/PSP multi-scale modules, and a dual channel-spatial attention structure—enhancing scale adaptability and discriminative capability in complex cloud regions. Evaluated on CloudSEN12 and L8Biome benchmarks, our model achieves state-of-the-art performance while significantly reducing parameter count and computational cost compared to existing approaches, demonstrating strong potential for large-scale remote sensing applications.
📝 Abstract
Clouds remain a critical challenge in optical satellite imagery, hindering reliable analysis for environmental monitoring, land cover mapping, and climate research. To overcome this, we propose MSCloudCAM, a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation. Our framework exploits the spectral richness of Sentinel-2 (CloudSEN12) and Landsat-8 (L8Biome) data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow. MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning. A Cross-Attention block enables effective multisensor and multispectral feature fusion, while the integration of an Efficient Channel Attention Block (ECAB) and a Spatial Attention Module adaptively refine feature representations. Comprehensive experiments on CloudSEN12 and L8Biome demonstrate that MSCloudCAM delivers state-of-the-art segmentation accuracy, surpassing leading baseline architectures while maintaining competitive parameter efficiency and FLOPs. These results underscore the model's effectiveness and practicality, making it well-suited for large-scale Earth observation tasks and real-world applications.