🤖 AI Summary
This work addresses the challenge of spurious change detection in heterogeneous remote sensing imagery (e.g., EO-SAR pairs) caused by temporal asynchrony, sensor discrepancies, and variations in illumination, season, and modality. To this end, we propose a frequency-aware multi-scale fusion framework that, for the first time, integrates Fourier and Haar wavelet spectral features into multimodal change detection. Our approach introduces a rectified perception-based triple-branch fusion module, which synergistically combines deformable spatial alignment with an adaptive gating mechanism to jointly exploit complementary cues in both frequency and spatial domains, while achieving linear-complexity decoding. Built upon a DINOv3-pretrained ConvNeXt encoder and a VMamba decoder, the model attains state-of-the-art performance across the BRIGHT, LEVIR-CD, and WHU-CD benchmarks, demonstrates superior robustness under perturbations, and reduces computational cost by approximately 24 GFLOPs compared to baseline methods.
📝 Abstract
Remote sensing change detection for real-world monitoring often relies on imperfect heterogeneous observations, where pre- and post-event images may be asynchronous, cross-sensor, or affected by illumination, seasonal, and modality shifts. This setting is especially challenging for EO-SAR disaster mapping, where nuisance variation can resemble structural damage. We propose FAF-CD, a frequency-aware hybrid framework with a DINOv3-pretrained ConvNeXt encoder and a linear-complexity VMamba-based decoder. Its rectification-aware tri-branch fusion module combines deformable spatial alignment with Fourier and Haar-wavelet comparisons, using adaptive gating to aggregate complementary cues across scales. On BRIGHT validation, a matched heterogeneous EO-SAR adaptation improves clean and perturbed tc-mIoU/tc-mAP over NeXt2Former-CD. FAF-CD also generalizes to binary optical CD, achieving 0.924 cF1 on LEVIR-CD and 0.955 cF1 on WHU-CD, and obtains the best average perturbed cIoU/cF1 on both binary datasets among M-CD and NeXt2Former-CD under pseudo-change-aligned stress tests. It further reduces cost by approximately 24 GFLOPs relative to NeXt2Former-CD while maintaining or improving accuracy.