🤖 AI Summary
To address the challenge of limited high-resolution acquisition in remote sensing imagery—constrained by sensor spatial-temporal resolution and observational cost—this paper proposes a super-resolution method integrating wavelet transform with diffusion modeling. The core innovation is a metadata-wavelet-temporal-aware encoder (MWT-Encoder), which jointly models physical metadata, multi-scale frequency-domain features, and temporal correlations to guide a hierarchical latent diffusion process, thereby effectively reconstructing textures, edges, and high-frequency spectral details. The resulting end-to-end framework achieves state-of-the-art performance across multiple remote sensing benchmarks, with substantial improvements in FID (Fréchet Inception Distance) and LPIPS (Learned Perceptual Image Patch Similarity). These quantitative gains confirm its capability to generate high-fidelity, perceptually realistic imagery, demonstrating strong potential for applications in environmental monitoring and disaster response.
📝 Abstract
The acquisition of high-resolution satellite imagery is often constrained by the spatial and temporal limitations of satellite sensors, as well as the high costs associated with frequent observations. These challenges hinder applications such as environmental monitoring, disaster response, and agricultural management, which require fine-grained and high-resolution data. In this paper, we propose MWT-Diff, an innovative framework for satellite image super-resolution (SR) that combines latent diffusion models with wavelet transforms to address these challenges. At the core of the framework is a novel metadata-, wavelet-, and time-aware encoder (MWT-Encoder), which generates embeddings that capture metadata attributes, multi-scale frequency information, and temporal relationships. The embedded feature representations steer the hierarchical diffusion dynamics, through which the model progressively reconstructs high-resolution satellite imagery from low-resolution inputs. This process preserves critical spatial characteristics including textural patterns, boundary discontinuities, and high-frequency spectral components essential for detailed remote sensing analysis. The comparative analysis of MWT-Diff across multiple datasets demonstrated favorable performance compared to recent approaches, as measured by standard perceptual quality metrics including FID and LPIPS.