Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address boundary ambiguity and insufficient coordination between global semantics and local details in U-Net—particularly under complex backgrounds, long-range dependencies, and multi-scale object segmentation—this paper proposes Attention-Fused U-Net (AFUNet). AFUNet is the first architecture to jointly integrate dual-path Convolutional Block Attention Modules (CBAM) and a weighted multi-scale feature fusion strategy into skip connections, thereby enhancing joint channel-spatial attention modeling and cross-scale feature interaction. This design significantly improves background interference suppression and boundary localization accuracy. Evaluated on the Cityscapes dataset, AFUNet achieves 76.5% mean Intersection-over-Union (mIoU) and 95.3% pixel accuracy, outperforming FCN, SegNet, DeepLabv3+, and PSPNet. The proposed method demonstrates strong applicability to high-precision semantic segmentation tasks in autonomous driving, remote sensing, and medical imaging.

Technology Category

Application Category

📝 Abstract

Semantic segmentation is one of the core tasks in the field of computer vision, and its goal is to accurately classify each pixel in an image. The traditional Unet model achieves efficient feature extraction and fusion through an encoder-decoder structure, but it still has certain limitations when dealing with complex backgrounds, long-distance dependencies, and multi-scale targets. To this end, this paper proposes an improved Unet model combined with an attention mechanism, introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features, and optimizes skip connections through a multi-scale feature fusion strategy, thereby improving the combination of global semantic information and fine-grained features. The experiment is based on the Cityscapes dataset and compared with classic models such as FCN, SegNet, DeepLabv3+, and PSPNet. The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively. The experimental results verify the superiority of this method in dealing with complex scenes and blurred target boundaries. In addition, this paper discusses the potential of the improved model in practical applications and future expansion directions, indicating that it has broad application value in fields such as autonomous driving, remote sensing image analysis, and medical image processing.

Problem

Research questions and friction points this paper is trying to address.

Enhances multi-scale semantic segmentation accuracy

Improves Unet with attention mechanisms

Optimizes feature fusion for complex scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Mechanism

Multi-Scale Fusion

Improved Unet Model

🔎 Similar Papers

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation