🤖 AI Summary
To address severe class imbalance in remote sensing change detection, this paper proposes a multi-scale robust detection method based on fine-tuned Segment Anything Model (SAM). The method introduces three key innovations: (1) a Cross-Entropy Mask (CEM) loss function that explicitly models pixel-wise class weights, significantly enhancing sensitivity to sparse change regions; (2) a Spatio-Temporal Feature Enhancement (STFE) module that fuses bi-temporal features to strengthen change discriminability; and (3) a Multi-Scale Decoder Fusion (MSDF) architecture enabling adaptive aggregation of change features across scales. Extensive experiments on four benchmark datasets—Levir-CD, WHU-CD, CLCD, and S2Looking—demonstrate consistent state-of-the-art performance. Notably, the method achieves a 2.5% improvement in F1-score on S2Looking, validating both its effectiveness and strong generalization capability across diverse change detection scenarios.
📝 Abstract
Foundational models have achieved significant success in diverse domains of computer vision. They learn general representations that are easily transferable to tasks not seen during training. One such foundational model is Segment anything model (SAM), which can accurately segment objects in images. We propose adapting the SAM encoder via fine-tuning for remote sensing change detection (RSCD) along with spatial-temporal feature enhancement (STFE) and multi-scale decoder fusion (MSDF) to detect changes robustly at multiple scales. Additionally, we propose a novel cross-entropy masking (CEM) loss to handle high class imbalance in change detection datasets. Our method outperforms state-of-the-art (SOTA) methods on four change detection datasets, Levir-CD, WHU-CD, CLCD, and S2Looking. We achieved 2.5% F1-score improvement on a large complex S2Looking dataset. The code is available at: https://github.com/humza909/SAM-CEM-CD