🤖 AI Summary
To address insufficient multi-scale feature extraction, inefficient attention mechanisms, and inadequate global dependency modeling in ground-based cloud image segmentation, this paper proposes a lightweight and efficient network integrating partial attention convolution (ParCM/ParAM) with the Mamba state space model. We design a Partial Selection Module (ParSM) to enhance channel-wise interaction and cross-scale spatial response, construct a linear-complexity M2B decoder architecture, and introduce an SSH dual-path aggregation module to strengthen hierarchical feature correlation. Additionally, we release CSRC—a high-quality, fine-grained cloud image segmentation dataset. Experiments on CSRC demonstrate that our method significantly outperforms state-of-the-art approaches, achieving a 3.2% accuracy gain and 2.1× faster inference speed. To the best of our knowledge, this is the first work to achieve simultaneous high accuracy and low latency in cloud image segmentation, providing reliable support for real-time applications such as photovoltaic power forecasting.
📝 Abstract
Ground-based cloud image segmentation is a critical research domain for photovoltaic power forecasting. Current deep learning approaches primarily focus on encoder-decoder architectural refinements. However, existing methodologies exhibit several limitations:(1)they rely on dilated convolutions for multi-scale context extraction, lacking the partial feature effectiveness and interoperability of inter-channel;(2)attention-based feature enhancement implementations neglect accuracy-throughput balance; and (3)the decoder modifications fail to establish global interdependencies among hierarchical local features, limiting inference efficiency. To address these challenges, we propose MPCM-Net, a Multi-scale network that integrates Partial attention Convolutions with Mamba architectures to enhance segmentation accuracy and computational efficiency. Specifically, the encoder incorporates MPAC, which comprises:(1)a MPC block with ParCM and ParSM that enables global spatial interaction across multi-scale cloud formations, and (2)a MPA block combining ParAM and ParSM to extract discriminative features with reduced computational complexity. On the decoder side, a M2B is employed to mitigate contextual loss through a SSHD that maintains linear complexity while enabling deep feature aggregation across spatial and scale dimensions. As a key contribution to the community, we also introduce and release a dataset CSRC, which is a clear-label, fine-grained segmentation benchmark designed to overcome the critical limitations of existing public datasets. Extensive experiments on CSRC demonstrate the superior performance of MPCM-Net over state-of-the-art methods, achieving an optimal balance between segmentation accuracy and inference speed. The dataset and source code will be available at https://github.com/she1110/CSRC.