🤖 AI Summary
To address boundary ambiguity and insufficient coordination between global semantics and local details in U-Net—particularly under complex backgrounds, long-range dependencies, and multi-scale object segmentation—this paper proposes Attention-Fused U-Net (AFUNet). AFUNet is the first architecture to jointly integrate dual-path Convolutional Block Attention Modules (CBAM) and a weighted multi-scale feature fusion strategy into skip connections, thereby enhancing joint channel-spatial attention modeling and cross-scale feature interaction. This design significantly improves background interference suppression and boundary localization accuracy. Evaluated on the Cityscapes dataset, AFUNet achieves 76.5% mean Intersection-over-Union (mIoU) and 95.3% pixel accuracy, outperforming FCN, SegNet, DeepLabv3+, and PSPNet. The proposed method demonstrates strong applicability to high-precision semantic segmentation tasks in autonomous driving, remote sensing, and medical imaging.
📝 Abstract
Semantic segmentation is one of the core tasks in the field of computer vision, and its goal is to accurately classify each pixel in an image. The traditional Unet model achieves efficient feature extraction and fusion through an encoder-decoder structure, but it still has certain limitations when dealing with complex backgrounds, long-distance dependencies, and multi-scale targets. To this end, this paper proposes an improved Unet model combined with an attention mechanism, introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features, and optimizes skip connections through a multi-scale feature fusion strategy, thereby improving the combination of global semantic information and fine-grained features. The experiment is based on the Cityscapes dataset and compared with classic models such as FCN, SegNet, DeepLabv3+, and PSPNet. The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively. The experimental results verify the superiority of this method in dealing with complex scenes and blurred target boundaries. In addition, this paper discusses the potential of the improved model in practical applications and future expansion directions, indicating that it has broad application value in fields such as autonomous driving, remote sensing image analysis, and medical image processing.