🤖 AI Summary
To address the heavy reliance on expert intervention and pre-annotated data in remote sensing image labeling, this paper proposes an unsupervised automatic annotation method for Sentinel-2 imagery. The method first performs initial image segmentation by jointly modeling color and spatial similarity, constructing a region-level graph structure. It then introduces a rotation-invariant graph neural network to aggregate neighborhood topological relationships and learn robust unsupervised feature representations. Finally, it enables efficient clustering of homogeneous geographic regions and semantically consistent labeling. Unlike conventional supervised paradigms, the approach requires no prior labels, significantly improving annotation efficiency, scalability, and interpretability. Experimental results demonstrate superior performance in fine-grained land-cover classification and outlier suppression, validating its effectiveness for large-scale, label-free remote sensing analysis.
📝 Abstract
Machine learning for remote sensing imaging relies on up-to-date and accurate labels for model training and testing. Labelling remote sensing imagery is time and cost intensive, requiring expert analysis. Previous labelling tools rely on pre-labelled data for training in order to label new unseen data. In this work, we define an unsupervised pipeline for finding and labelling geographical areas of similar context and content within Sentinel-2 satellite imagery. Our approach removes limitations of previous methods by utilising segmentation with convolutional and graph neural networks to encode a more robust feature space for image comparison. Unlike previous approaches we segment the image into homogeneous regions of pixels that are grouped based on colour and spatial similarity. Graph neural networks are used to aggregate information about the surrounding segments enabling the feature representation to encode the local neighbourhood whilst preserving its own local information. This reduces outliers in the labelling tool, allows users to label at a granular level, and allows a rotationally invariant semantic relationship at the image level to be formed within the encoding space.