Spatially Selective Self-Training for Unsupervised Building Change Detection

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing unsupervised building change detection methods, which often rely on generic temporal differences and are thus susceptible to appearance variations, registration errors, and non-building interferences, making it difficult to capture genuine structural changes. To overcome these challenges, the authors propose the SST-CD framework, which reformulates the task as end-to-end learning under noisy pseudo-supervision, trained exclusively on spatially reliable pixels. High-quality pseudo-labels are selected via a local consistency criterion, and robust change modeling is achieved through a lightweight feature adapter coupled with a prototype decoder. The method achieves F1 scores of 83.08%, 91.69%, and 86.60% on the LEVIR-CD, WHU-CD, and DSIFN-CD benchmarks, respectively, significantly outperforming current unsupervised and label-free approaches.
📝 Abstract
Unsupervised building change detection aims to learn building-change masks from unlabeled bi-temporal remote sensing images. Existing label-free methods often follow a discrepancy-to-mask paradigm, directly using temporal differences, frozen foundation-model responses, prompt-based outputs, or post-processing results as final change maps. Although these strategies provide annotation-free cues, they do not learn a task-specific building-change detector and remain vulnerable to the gap between generic temporal discrepancies and building-defined structural changes. In practice, such discrepancies are often noisy and task-irrelevant, as appearance shifts, registration errors, and non-building modifications can produce strong but misleading responses. To address this problem, we propose SST-CD, a spatially selective self-training framework that reformulates fully label-free building change detection as end-to-end detector learning under noisy pseudo supervision. SST-CD uses temporal discrepancies as candidate pseudo labels and trains the detector only on spatially reliable pixels, whose reliability is estimated by a local consistency criterion that filters inconsistent regions from supervision. To further stabilize noisy self-training, a lightweight feature adapter recalibrates bi-temporal features, while a prototype-based decoder produces compact change and no-change representations. Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show that SST-CD achieves F1 scores of 83.08\%, 91.69\%, and 86.60\%, respectively, outperforming existing unsupervised and label-free baselines. Code will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

unsupervised building change detection
temporal discrepancy
noisy pseudo supervision
remote sensing
change detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatially selective self-training
unsupervised change detection
pseudo-label filtering
local consistency
prototype-based representation