USF-Net: A Unified Spatiotemporal Fusion Network for Ground-Based Remote Sensing Cloud Image Sequence Extrapolation

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in ground-based remote sensing cloud image sequence extrapolation—including insufficient adaptive multi-scale feature extraction, weak long-range spatiotemporal dependency modeling, and high computational overhead of attention mechanisms—this paper proposes the Unified Spatiotemporal Fusion Network (USF-Net). Methodologically, USF-Net introduces a spatiotemporal guidance module that integrates adaptive large-kernel convolutions with a low-complexity attention mechanism to enable dynamic multi-scale contextual capture and long-term temporal modeling. It further incorporates state-space models, optical-flow-guided spatiotemporal dependency learning, temporal gating, and dynamic sparsity to suppress ghosting artifacts and enhance prediction coherence and sharpness. Additionally, we construct ASI-CIS, a high-quality cloud image dataset. Experiments demonstrate that USF-Net achieves superior accuracy–efficiency trade-offs, significantly outperforming state-of-the-art methods and establishing a new, efficient, and reliable paradigm for cloud image extrapolation in photovoltaic forecasting.

Technology Category

Application Category

📝 Abstract
Ground-based remote sensing cloud image sequence extrapolation is a key research area in the development of photovoltaic power systems. However, existing approaches exhibit several limitations:(1)they primarily rely on static kernels to augment feature information, lacking adaptive mechanisms to extract features at varying resolutions dynamically;(2)temporal guidance is insufficient, leading to suboptimal modeling of long-range spatiotemporal dependencies; and(3)the quadratic computational cost of attention mechanisms is often overlooked, limiting efficiency in practical deployment. To address these challenges, we propose USF-Net, a Unified Spatiotemporal Fusion Network that integrates adaptive large-kernel convolutions and a low-complexity attention mechanism, combining temporal flow information within an encoder-decoder framework. Specifically, the encoder employs three basic layers to extract features. Followed by the USTM, which comprises:(1)a SiB equipped with a SSM that dynamically captures multi-scale contextual information, and(2)a TiB featuring a TAM that effectively models long-range temporal dependencies while maintaining computational efficiency. In addition, a DSM with a TGM is introduced to enable unified modeling of temporally guided spatiotemporal dependencies. On the decoder side, a DUM is employed to address the common"ghosting effect."It utilizes the initial temporal state as an attention operator to preserve critical motion signatures. As a key contribution, we also introduce and release the ASI-CIS dataset. Extensive experiments on ASI-CIS demonstrate that USF-Net significantly outperforms state-of-the-art methods, establishing a superior balance between prediction accuracy and computational efficiency for ground-based cloud extrapolation. The dataset and source code will be available at https://github.com/she1110/ASI-CIS.
Problem

Research questions and friction points this paper is trying to address.

Extrapolating ground-based remote sensing cloud image sequences for photovoltaic systems
Addressing limitations in adaptive feature extraction and temporal dependency modeling
Overcoming computational inefficiency in attention mechanisms for cloud prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive large-kernel convolutions for multi-scale feature extraction
Low-complexity attention mechanism for efficient temporal modeling
Unified spatiotemporal fusion with temporally guided motion signatures
🔎 Similar Papers
No similar papers found.
P
Penghui Niu
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
Taotao Cai
Taotao Cai
University of Southern Queensland
J
Jiashuai She
School of Electrical Engineering, Hebei University of Technology, Tianjin 300401, China
Y
Yajuan Zhang
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
J
Junhua Gu
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China; Hebei Province Key Laboratory of Big Data Calculation, Hebei University of Technology, Tianjin 300401, China
P
Ping Zhang
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China; Hebei Province Key Laboratory of Big Data Calculation, Hebei University of Technology, Tianjin 300401, China
Jungong Han
Jungong Han
Chair Professor in Computer Vision, University of Sheffield, UK, FIAPR, FAAIA
Computer VisionVideo AnalyticsMachine Learning
J
Jianxin Li
Discipline of Business Systems and Operations, School of Business and Law, Edith Cowan University, Joondalup, WA 6027, Australia