🤖 AI Summary
Existing crack detection methods struggle to simultaneously achieve pixel-level accuracy, effective local texture modeling, and long-range pixel dependency capture, while suffering from excessive model parameters and high computational overhead—hindering edge deployment. To address these challenges, this paper proposes a lightweight pixel-wise segmentation framework. We design a stepwise cascaded fusion module that jointly optimizes efficient local pattern recognition and long-range dependency modeling for the first time; introduce a network-wide lightweight convolutional block to significantly reduce computational cost; and integrate multi-scale contextual modeling with noise-robust enhancement mechanisms. Evaluated on our newly established TUT benchmark and five public datasets, the method achieves state-of-the-art performance: F1 = 0.8382 and mIoU = 0.8473 on TUT, with the lowest parameter count and FLOPs among all existing approaches.
📝 Abstract
Detecting cracks with pixel-level precision for key structures is a significant challenge, as existing methods struggle to effectively integrate local textures and pixel dependencies of cracks. Furthermore, these methods often possess numerous parameters and substantial computational requirements, complicating deployment on edge control devices. In this paper, we propose a staircase cascaded fusion crack segmentation network (CrackSCF) that generates high-quality crack segmentation maps using minimal computational resources. We constructed a staircase cascaded fusion module that effectively captures local patterns of cracks and long-range dependencies of pixels, and it can suppress background noise well. To reduce the computational resources required by the model, we introduced a lightweight convolution block, which replaces all convolution operations in the network, significantly reducing the required computation and parameters without affecting the network's performance. To evaluate our method, we created a challenging benchmark dataset called TUT and conducted experiments on this dataset and five other public datasets. The experimental results indicate that our method offers significant advantages over existing methods, especially in handling background noise interference and detailed crack segmentation. The F1 and mIoU scores on the TUT dataset are 0.8382 and 0.8473, respectively, achieving state-of-the-art (SOTA) performance while requiring the least computational resources. The code and dataset is available at https://github.com/Karl1109/CrackSCF.