LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation

๐Ÿ“… 2025-09-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing remote sensing change detection methods predominantly rely on unimodal visual features, neglecting textual semantic guidance, which limits their accuracy and robustness. To address this, we propose LG-CD, a language-guided change detection framework. LG-CD is the first to incorporate SAM2 as a multi-scale visual backbone for this task and introduces two novel components: a Text Fusion Attention Module (TFAM) and a cross-attention-based Visualโ€“Semantic Fusion Decoder (V-SFD), enabling effective cross-modal alignment and fine-grained change localization. Furthermore, multi-layer adapters facilitate efficient parameter-efficient fine-tuning. Extensive experiments demonstrate that LG-CD achieves state-of-the-art performance on LEVIR-CD, WHU-CD, and SYSU-CD benchmarks, significantly improving both detection accuracy and generalization capability. This work establishes a new paradigm for generic multimodal remote sensing change detection.

Technology Category

Application Category

๐Ÿ“ Abstract
Remote Sensing Change Detection (RSCD) typically identifies changes in land cover or surface conditions by analyzing multi-temporal images. Currently, most deep learning-based methods primarily focus on learning unimodal visual information, while neglecting the rich semantic information provided by multimodal data such as text. To address this limitation, we propose a novel Language-Guided Change Detection model (LG-CD). This model leverages natural language prompts to direct the network's attention to regions of interest, significantly improving the accuracy and robustness of change detection. Specifically, LG-CD utilizes a visual foundational model (SAM2) as a feature extractor to capture multi-scale pyramid features from high-resolution to low-resolution across bi-temporal remote sensing images. Subsequently, multi-layer adapters are employed to fine-tune the model for downstream tasks, ensuring its effectiveness in remote sensing change detection. Additionally, we design a Text Fusion Attention Module (TFAM) to align visual and textual information, enabling the model to focus on target change regions using text prompts. Finally, a Vision-Semantic Fusion Decoder (V-SFD) is implemented, which deeply integrates visual and semantic information through a cross-attention mechanism to produce highly accurate change detection masks. Our experiments on three datasets (LEVIR-CD, WHU-CD, and SYSU-CD) demonstrate that LG-CD consistently outperforms state-of-the-art change detection methods. Furthermore, our approach provides new insights into achieving generalized change detection by leveraging multimodal information.
Problem

Research questions and friction points this paper is trying to address.

Enhancing change detection accuracy using language guidance and visual models
Integrating text prompts to focus on specific change regions in imagery
Aligning multimodal data through vision-semantic fusion for robust detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SAM2 as visual feature extractor
Employs adapters for fine-tuning downstream tasks
Integrates vision-text fusion via cross-attention mechanism
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yixiao Liu
College of Computer Science, Sichuan University, China
Y
Yizhou Yang
College of Computer Science, Sichuan University, China
J
Jinwen Li
School of Computer Science and Technology, Xinjiang University, China
Jun Tao
Jun Tao
School of Computer Science and Engineering, Sun Yat-sen University
Scientific visualizationuser interface and interactionvisual analyticssoftware visualization
R
Ruoyu Li
College of Computer Science, Sichuan University, China
Xiangkun Wang
Xiangkun Wang
University of Science and Technology
steganographydiffusion model
M
Min Zhu
College of Computer Science, Sichuan University, China
Junlong Cheng
Junlong Cheng
Sichuan University
Artificial intelligenceMedical Image Analysis