🤖 AI Summary
To address the longstanding trade-off between temporal and spatial resolution in remote sensing land surface temperature (LST) products, this paper proposes an end-to-end weakly supervised generative network that enables daily 10-meter LST fusion and retrieval without ground-truth labels. The method jointly leverages coarse-temporal Terra MODIS and fine-spatial Landsat-8 and Sentinel-2 observations within a conditional generative adversarial network (cGAN) framework. Key components include multi-level encoders, temporal attention mechanisms, cosine-similarity-based feature fusion, a PatchGAN discriminator, and a physically grounded averaging constraint; Gaussian denoising filtering is further incorporated to enhance robustness in thermal detail reconstruction. Evaluated across 33 in-situ validation sites, the approach reduces RMSE by 17.18% and improves SSIM by 11.00% over the best baseline, demonstrating superior spatial consistency and enhanced capability in characterizing thermal anomalies under cloud-contaminated conditions.
📝 Abstract
Urbanization, climate change, and agricultural stress are increasing the demand for precise and timely environmental monitoring. Land Surface Temperature (LST) is a key variable in this context and is retrieved from remote sensing satellites. However, these systems face a trade-off between spatial and temporal resolution. While spatio-temporal fusion methods offer promising solutions, few have addressed the estimation of daily LST at 10 m resolution. In this study, we present WGAST, a Weakly-Supervised Generative Network for Daily 10 m LST Estimation via Spatio-Temporal Fusion of Terra MODIS, Landsat 8, and Sentinel-2. WGAST is the first end-to-end deep learning framework designed for this task. It adopts a conditional generative adversarial architecture, with a generator composed of four stages: feature extraction, fusion, LST reconstruction, and noise suppression. The first stage employs a set of encoders to extract multi-level latent representations from the inputs, which are then fused in the second stage using cosine similarity, normalization, and temporal attention mechanisms. The third stage decodes the fused features into high-resolution LST, followed by a Gaussian filter to suppress high-frequency noise. Training follows a weakly supervised strategy based on physical averaging principles and reinforced by a PatchGAN discriminator. Experiments demonstrate that WGAST outperforms existing methods in both quantitative and qualitative evaluations. Compared to the best-performing baseline, on average, WGAST reduces RMSE by 17.18% and improves SSIM by 11.00%. Furthermore, WGAST is robust to cloud-induced LST and effectively captures fine-scale thermal patterns, as validated against 33 ground-based sensors. The code is available at https://github.com/Sofianebouaziz1/WGAST.git.