🤖 AI Summary
To address the high computational overhead and insufficient cross-scale feature interaction in existing lightweight models for real-time steel surface defect detection, this paper proposes GMBINet, a novel lightweight network. Its core innovation is the Group-wise Multi-scale Bidirectional Interaction Module (GMBI), which enhances multi-scale feature interaction via parameter-free element-wise multiplication-and-addition operations and a progressive bidirectional fusion mechanism—introducing zero additional computational cost. Coupled with depthwise separable convolutions, GMBI enables an efficient multi-branch lightweight architecture. With only 0.19M parameters, GMBINet achieves 1048 FPS on GPU and 16.53 FPS on CPU, significantly outperforming state-of-the-art lightweight methods. Moreover, it demonstrates high accuracy and strong generalization across multiple steel defect datasets, validating its effectiveness for practical industrial deployment.
📝 Abstract
Real-time surface defect detection is critical for maintaining product quality and production efficiency in the steel manufacturing industry. Despite promising accuracy, existing deep learning methods often suffer from high computational complexity and slow inference speeds, which limit their deployment in resource-constrained industrial environments. Recent lightweight approaches adopt multibranch architectures based on depthwise separable convolution (DSConv) to capture multiscale contextual information. However, these methods often suffer from increased computational overhead and lack effective cross-scale feature interaction, limiting their ability to fully leverage multiscale representations. To address these challenges, we propose GMBINet, a lightweight framework that enhances multiscale feature extraction and interaction through novel Group Multiscale Bidirectional Interactive (GMBI) modules. The GMBI adopts a group-wise strategy for multiscale feature extraction, ensuring scale-agnostic computational complexity. It further integrates a Bidirectional Progressive Feature Interactor (BPFI) and a parameter-free Element-Wise Multiplication-Summation (EWMS) operation to enhance cross-scale interaction without introducing additional computational overhead. Experiments on SD-Saliency-900 and NRSD-MN datasets demonstrate that GMBINet delivers competitive accuracy with real-time speeds of 1048 FPS on GPU and 16.53 FPS on CPU at 512 resolution, using only 0.19 M parameters. Additional evaluations on the NEU-CLS defect classification dataset further confirm the strong generalization ability of our method, demonstrating its potential for broader industrial vision applications beyond surface defect detection. The dataset and code are publicly available at: https://github.com/zhangyongcode/GMBINet.