🤖 AI Summary
Existing crack segmentation methods, predominantly built upon generic semantic segmentation architectures, struggle to effectively preserve weak structural cues, maintain directional continuity, and suppress background interference. This work reframes crack segmentation as a sparse structure recovery problem and introduces RIFT, a lightweight model that explicitly incorporates morphological priors—such as elongation, sparsity, and anisotropy—through local evidence preservation, collaborative directional continuity aggregation, and multi-scale fusion. The proposed RIFT-T (with only 0.47M parameters) and RIFT-B variants feature compact yet efficient architectures, achieving state-of-the-art or tied-best performance across 16 primary metrics on four public datasets, thereby offering both high accuracy and fast inference.
📝 Abstract
Recent crack segmentation methods often follow generic semantic segmentation designs, using stronger backbones, hybrid CNN-Transformer-Mamba encoders, and auxiliary enhancement branches. Although effective, this raises whether stronger generic feature mixing is the most suitable direction for crack segmentation. We instead formulate crack segmentation as sparse structural recovery. Cracks have limited category-level semantics but strong morphological regularities, being thin, sparse, anisotropic, locally fragmented, and easily confused with textures or shadows. Thus, the key bottleneck lies in preserving weak structural evidence, recovering directional continuity, and suppressing background coupling. We propose RIFT, a compact family of morphology-aligned crack segmentation models. Rather than compressing a complex generic architecture, RIFT is simple by design, preserving local evidence, aggregating cooperative directional continuity, and restoring crack structures through lightweight multi-scale fusion. Experiments on four public benchmarks show that RIFT achieves the best or tied-best results across the 16 main metrics against reproduced representative baselines. RIFT-B gives the strongest overall accuracy, while RIFT-T provides the best deployment efficiency with only 0.47M parameters and high inference speed. Topology-aware evaluation, ablations, transfer experiments, and visualizations further verify that task-aligned simplicity can match or surpass complex hybrid architectures when its inductive bias fits crack morphology. Code: https://github.com/xauat-liushipeng/RIFT