🤖 AI Summary
To address spectral–spatial feature confusion and loss during fusion of high-resolution panchromatic (PAN) and low-resolution multispectral (MS) images in remote sensing pansharpening, this paper proposes a multi-scale wavelet-domain spectral-aware fusion framework. Methodologically, it integrates wavelet transforms, multi-scale pyramids, and customized self-attention, augmented with a frequency-domain feature preservation constraint. Key contributions include: (1) a physically grounded Multi-Frequency Fusion Attention (MFFA) mechanism that constructs frequency queries, spatial keys, and fusion values to enable cross-modal spectral–spatial feature co-modeling; and (2) a wavelet-pyramid-guided paradigm for lossless cross-scale frequency feature reconstruction and decoupled fusion, ensuring spectral structural integrity. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods in quantitative metrics (e.g., PSNR, SSIM). Qualitative results further confirm superior spectral fidelity, spatial detail recovery, and robustness in real-world scenarios.
📝 Abstract
Pansharpening aims to combine a high-resolution panchromatic (PAN) image with a low-resolution multispectral (LRMS) image to produce a high-resolution multispectral (HRMS) image. Although pansharpening in the frequency domain offers clear advantages, most existing methods either continue to operate solely in the spatial domain or fail to fully exploit the benefits of the frequency domain. To address this issue, we innovatively propose Multi-Frequency Fusion Attention (MFFA), which leverages wavelet transforms to cleanly separate frequencies and enable lossless reconstruction across different frequency domains. Then, we generate Frequency-Query, Spatial-Key, and Fusion-Value based on the physical meanings represented by different features, which enables a more effective capture of specific information in the frequency domain. Additionally, we focus on the preservation of frequency features across different operations. On a broader level, our network employs a wavelet pyramid to progressively fuse information across multiple scales. Compared to previous frequency domain approaches, our network better prevents confusion and loss of different frequency features during the fusion process. Quantitative and qualitative experiments on multiple datasets demonstrate that our method outperforms existing approaches and shows significant generalization capabilities for real-world scenarios.