🤖 AI Summary
This work addresses the limitations of existing semantic change detection methods, which often suffer from inadequate cross-temporal semantic alignment, weak multi-scale representation capabilities, and poor robustness to illumination variations, seasonal changes, and registration noise. To overcome these challenges, the authors propose an end-to-end network that integrates CNN features with frozen DINOv3 representations and introduces a multi-scale bidirectional temporal Transformer interaction module (M-TBTT). The architecture further incorporates a semantic cleansing procedure (SCP) and a bidirectional change enhancement mechanism (BiChangeEnhance), operating in concert with a decoupled multi-task prediction head. Extensive experiments on multiple public remote sensing datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches, achieving superior detection accuracy and generalization performance, particularly under complex environmental disturbances.
📝 Abstract
Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing methods suffer from insufficient cross-temporal alignment, weak multi-scale representation, and poor robustness to pseudo-changes caused by illumination, season, and registration noise. To address these issues, we propose a novel end-to-end semantic change detection network named SemDINO, which integrates a dual-branch encoder, multi-scale temporal interaction, semantic purification, change enhancement, and decoupled multi-task prediction into a unified framework. Specifically, we construct a dual-branch encoder that combines a CNN backbone and frozen DINOv3 features via gated pyramid fusion, enabling rich multi-scale semantic representation. Then, a multi-scale temporal bidirectional transformer interaction (M-TBTT) module is proposed to achieve global cross-temporal feature alignment and information interaction. To further enhance genuine changes and suppress pseudo-variations, we introduce semantic purification (SCP), bidirectional change enhancement (BiChangeEnhance), and multi-scale change enhancement (MCE) modules collaboratively. Finally, a multi-branch CD prediction head is designed to jointly output binary change mask, bi-temporal semantic maps, and edge constraint. Extensive experiments on public remote sensing CD datasets demonstrate that SemDINO achieves superior performance and generalization ability against state-of-the-art methods, especially in complex scenarios with interference factors.