SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Existing remote sensing instance segmentation methods are largely confined to closed-vocabulary settings, exhibiting limited generalization to novel categories or cross-dataset scenarios. To address this, we propose an open-vocabulary remote sensing instance segmentation framework that innovatively incorporates multi-granularity scene context modeling: region-aware fusion enhances local object discriminability, global context adaptation improves vision-language alignment, and a CLIP-driven open-vocabulary learning mechanism enables zero-shot category recognition. This work is the first to systematically integrate deep scene contextual reasoning into the open-vocabulary segmentation paradigm, effectively mitigating challenges posed by complex terrain, seasonal variations, and small-scale objects. Extensive experiments demonstrate state-of-the-art performance across multiple remote sensing benchmarks—including RSIS, RSISS, and RSOD—while significantly improving generalizability and practical utility for large-scale geospatial analysis.

Technology Category

Application Category

📝 Abstract

Most existing remote sensing instance segmentation approaches are designed for close-vocabulary prediction, limiting their ability to recognize novel categories or generalize across datasets. This restricts their applicability in diverse Earth observation scenarios. To address this, we introduce open-vocabulary (OV) learning for remote sensing instance segmentation. While current OV segmentation models perform well on natural image datasets, their direct application to remote sensing faces challenges such as diverse landscapes, seasonal variations, and the presence of small or ambiguous objects in aerial imagery. To overcome these challenges, we propose $ extbf{SCORE}$ ($ extbf{S}$cene $ extbf{C}$ontext matters in $ extbf{O}$pen-vocabulary $ extbf{RE}$mote sensing instance segmentation), a framework that integrates multi-granularity scene context, i.e., regional context and global context, to enhance both visual and textual representations. Specifically, we introduce Region-Aware Integration, which refines class embeddings with regional context to improve object distinguishability. Additionally, we propose Global Context Adaptation, which enriches naive text embeddings with remote sensing global context, creating a more adaptable and expressive linguistic latent space for the classifier. We establish new benchmarks for OV remote sensing instance segmentation across diverse datasets. Experimental results demonstrate that, our proposed method achieves SOTA performance, which provides a robust solution for large-scale, real-world geospatial analysis. Our code is available at https://github.com/HuangShiqi128/SCORE.

Problem

Research questions and friction points this paper is trying to address.

Addresses limited close-vocabulary prediction in remote sensing segmentation

Overcomes challenges like diverse landscapes and small objects in aerial imagery

Enhances visual and textual representations with multi-granularity scene context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multi-granularity scene context

Refines embeddings with regional context

Enriches text embeddings with global context

🔎 Similar Papers

Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community