🤖 AI Summary
To address critical challenges in ground-based remote sensing cloud image sequence extrapolation—including insufficient adaptive multi-scale feature extraction, weak long-range spatiotemporal dependency modeling, and high computational overhead of attention mechanisms—this paper proposes the Unified Spatiotemporal Fusion Network (USF-Net). Methodologically, USF-Net introduces a spatiotemporal guidance module that integrates adaptive large-kernel convolutions with a low-complexity attention mechanism to enable dynamic multi-scale contextual capture and long-term temporal modeling. It further incorporates state-space models, optical-flow-guided spatiotemporal dependency learning, temporal gating, and dynamic sparsity to suppress ghosting artifacts and enhance prediction coherence and sharpness. Additionally, we construct ASI-CIS, a high-quality cloud image dataset. Experiments demonstrate that USF-Net achieves superior accuracy–efficiency trade-offs, significantly outperforming state-of-the-art methods and establishing a new, efficient, and reliable paradigm for cloud image extrapolation in photovoltaic forecasting.
📝 Abstract
Ground-based remote sensing cloud image sequence extrapolation is a key research area in the development of photovoltaic power systems. However, existing approaches exhibit several limitations:(1)they primarily rely on static kernels to augment feature information, lacking adaptive mechanisms to extract features at varying resolutions dynamically;(2)temporal guidance is insufficient, leading to suboptimal modeling of long-range spatiotemporal dependencies; and(3)the quadratic computational cost of attention mechanisms is often overlooked, limiting efficiency in practical deployment. To address these challenges, we propose USF-Net, a Unified Spatiotemporal Fusion Network that integrates adaptive large-kernel convolutions and a low-complexity attention mechanism, combining temporal flow information within an encoder-decoder framework. Specifically, the encoder employs three basic layers to extract features. Followed by the USTM, which comprises:(1)a SiB equipped with a SSM that dynamically captures multi-scale contextual information, and(2)a TiB featuring a TAM that effectively models long-range temporal dependencies while maintaining computational efficiency. In addition, a DSM with a TGM is introduced to enable unified modeling of temporally guided spatiotemporal dependencies. On the decoder side, a DUM is employed to address the common"ghosting effect."It utilizes the initial temporal state as an attention operator to preserve critical motion signatures. As a key contribution, we also introduce and release the ASI-CIS dataset. Extensive experiments on ASI-CIS demonstrate that USF-Net significantly outperforms state-of-the-art methods, establishing a superior balance between prediction accuracy and computational efficiency for ground-based cloud extrapolation. The dataset and source code will be available at https://github.com/she1110/ASI-CIS.