🤖 AI Summary
This work addresses the performance degradation in few-shot land cover segmentation from remote sensing time-series imagery caused by scarce annotations. It proposes a plug-and-play regularization strategy that leverages the Segment Anything Model (SAM) as a geometric prior, requiring neither additional annotations nor fine-tuning. By synthesizing cloud-free time-series images, SAM masks are generated in an unsupervised manner, and a novel RegionSmoothLoss is introduced to enforce prediction consistency within SAM-derived regions. The method significantly improves segmentation performance under few-shot conditions, achieving an average mIoU of 36.21% on the PASTIS-R benchmark with only 5% labeled data—surpassing the previous state of the art by 2.33 percentage points (a relative gain of 6.89%)—and reaching up to 40.28% mIoU with the best random seed.
📝 Abstract
Few-shot semantic segmentation of time-series remote sensing images remains a critical challenge, particularly in regions where labeled data is scarce or costly to obtain. While state-of-the-art models perform well under full supervision, their performance degrades significantly under limited labeling, limiting their real-world applicability. In this work, we propose SAM-Aug, a new annotation-efficient framework that leverages the geometry-aware segmentation capability of the Segment Anything Model (SAM) to improve few-shot land cover mapping. Our approach constructs cloud-free composite images from temporal sequences and applies SAM in a fully unsupervised manner to generate geometry-aware mask priors. These priors are then integrated into training through a proposed loss function called RegionSmoothLoss, which enforces prediction consistency within each SAM-derived region across temporal frames, effectively regularizing the model to respect semantically coherent structures. Extensive experiments on the PASTIS-R benchmark under a 5 percent labeled setting demonstrate the effectiveness and robustness of SAM-Aug. Averaged over three random seeds (42, 2025, 4090), our method achieves a mean test mIoU of 36.21 percent, outperforming the state-of-the-art baseline by +2.33 percentage points, a relative improvement of 6.89 percent. Notably, on the most favorable split (seed=42), SAM-Aug reaches a test mIoU of 40.28 percent, representing an 11.2 percent relative gain with no additional labeled data. The consistent improvement across all seeds confirms the generalization power of leveraging foundation model priors under annotation scarcity. Our results highlight that vision models like SAM can serve as useful regularizers in few-shot remote sensing learning, offering a scalable and plug-and-play solution for land cover monitoring without requiring manual annotations or model fine-tuning.