USF-Net: A Unified Spatiotemporal Fusion Network for Ground-Based Remote Sensing Cloud Image Sequence Extrapolation

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address critical challenges in ground-based remote sensing cloud image sequence extrapolation—including insufficient adaptive multi-scale feature extraction, weak long-range spatiotemporal dependency modeling, and high computational overhead of attention mechanisms—this paper proposes the Unified Spatiotemporal Fusion Network (USF-Net). Methodologically, USF-Net introduces a spatiotemporal guidance module that integrates adaptive large-kernel convolutions with a low-complexity attention mechanism to enable dynamic multi-scale contextual capture and long-term temporal modeling. It further incorporates state-space models, optical-flow-guided spatiotemporal dependency learning, temporal gating, and dynamic sparsity to suppress ghosting artifacts and enhance prediction coherence and sharpness. Additionally, we construct ASI-CIS, a high-quality cloud image dataset. Experiments demonstrate that USF-Net achieves superior accuracy–efficiency trade-offs, significantly outperforming state-of-the-art methods and establishing a new, efficient, and reliable paradigm for cloud image extrapolation in photovoltaic forecasting.

Technology Category

Application Category

📝 Abstract

Ground-based remote sensing cloud image sequence extrapolation is a key research area in the development of photovoltaic power systems. However, existing approaches exhibit several limitations:(1)they primarily rely on static kernels to augment feature information, lacking adaptive mechanisms to extract features at varying resolutions dynamically;(2)temporal guidance is insufficient, leading to suboptimal modeling of long-range spatiotemporal dependencies; and(3)the quadratic computational cost of attention mechanisms is often overlooked, limiting efficiency in practical deployment. To address these challenges, we propose USF-Net, a Unified Spatiotemporal Fusion Network that integrates adaptive large-kernel convolutions and a low-complexity attention mechanism, combining temporal flow information within an encoder-decoder framework. Specifically, the encoder employs three basic layers to extract features. Followed by the USTM, which comprises:(1)a SiB equipped with a SSM that dynamically captures multi-scale contextual information, and(2)a TiB featuring a TAM that effectively models long-range temporal dependencies while maintaining computational efficiency. In addition, a DSM with a TGM is introduced to enable unified modeling of temporally guided spatiotemporal dependencies. On the decoder side, a DUM is employed to address the common"ghosting effect."It utilizes the initial temporal state as an attention operator to preserve critical motion signatures. As a key contribution, we also introduce and release the ASI-CIS dataset. Extensive experiments on ASI-CIS demonstrate that USF-Net significantly outperforms state-of-the-art methods, establishing a superior balance between prediction accuracy and computational efficiency for ground-based cloud extrapolation. The dataset and source code will be available at https://github.com/she1110/ASI-CIS.

Problem

Research questions and friction points this paper is trying to address.

Extrapolating ground-based remote sensing cloud image sequences for photovoltaic systems

Addressing limitations in adaptive feature extraction and temporal dependency modeling

Overcoming computational inefficiency in attention mechanisms for cloud prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive large-kernel convolutions for multi-scale feature extraction

Low-complexity attention mechanism for efficient temporal modeling

Unified spatiotemporal fusion with temporally guided motion signatures

🔎 Similar Papers

Cloud gap-filling with deep learning for improved grassland monitoring