🤖 AI Summary
This work addresses the high computational cost, substantial memory footprint, and boundary information leakage commonly associated with contextual modules (e.g., ASPP, PPM) in conventional CNN-based semantic segmentation models. To overcome these limitations, the authors propose Geometry-guided Mamba (G-Mamba), which generalizes directional state space models into a lightweight, plug-and-play contextual aggregation module for the first time. By incorporating boundary and centripetal flow cues into the selective scan mechanism, G-Mamba leverages geometric priors to regulate long-range feature propagation. The module seamlessly replaces existing CNN segmentation heads and consistently improves the mIoU of six mainstream architectures on the Cityscapes dataset at 1024×1024 resolution, achieving notable accuracy gains with only marginal increases in computational overhead (measured in GFLOPs), thus balancing performance and efficiency.
📝 Abstract
CNN-based semantic segmentation networks usually rely on context heads such as ASPP, PPM, or attention modules to enlarge the receptive field. These heads are effective but may introduce heavy computation, memory cost, or boundary leakage. This paper revisits Directional Geometric Mamba (G-Mamba) from DGM-Net and studies it as a plug-and-play context aggregation module rather than a complete new segmentation architecture. The key idea is to inject geometric guidance into the selective scan process, allowing long-range feature propagation to be modulated by boundary and centripetal-flow cues. We replace the original context heads of six representative CNN segmentation models, including DeepLabV3+, DANet, CCNet, PSPNet, PSANet, and OCRNet, while keeping the ResNet-101 backbone unchanged. Results on Cityscapes show consistent mIoU gains with only moderate extra GFLOPs at $1024\times1024$ resolution, suggesting that geometry-guided SSM modules can serve as practical alternatives or enhancements to conventional CNN context heads.