An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and insufficient local detail modeling of Transformers in multi-organ medical image segmentation, this paper proposes LamFormer—a novel U-shaped architecture. Methodologically, it integrates three key innovations: (1) a linear-attention Mamba module to drastically reduce computational complexity for long-range dependency modeling; (2) an enhanced pyramid encoder coupled with a Parallel Hierarchical Feature Aggregation (PHFA) module to bridge semantic gaps across multi-scale features; and (3) a lightweight Reduced Transformer to jointly enhance local detail representation and global contextual awareness. Evaluated on seven mainstream medical imaging benchmarks, LamFormer consistently outperforms existing state-of-the-art methods—achieving superior segmentation accuracy while requiring significantly fewer parameters and lower FLOPs. It thus establishes a new Pareto-optimal trade-off between precision and efficiency in medical image segmentation.

Technology Category

Application Category

📝 Abstract
In the field of multi-organ medical image segmentation, recent methods frequently employ Transformers to capture long-range dependencies from image features. However, these methods overlook the high computational cost of Transformers and their deficiencies in extracting local detailed information. To address high computational costs and inadequate local detail information, we reassess the design of feature extraction modules and propose a new deep-learning network called LamFormer for fine-grained segmentation tasks across multiple organs. LamFormer is a novel U-shaped network that employs Linear Attention Mamba (LAM) in an enhanced pyramid encoder to capture multi-scale long-range dependencies. We construct the Parallel Hierarchical Feature Aggregation (PHFA) module to aggregate features from different layers of the encoder, narrowing the semantic gap among features while filtering information. Finally, we design the Reduced Transformer (RT), which utilizes a distinct computational approach to globally model up-sampled features. RRT enhances the extraction of detailed local information and improves the network's capability to capture long-range dependencies. LamFormer outperforms existing segmentation methods on seven complex and diverse datasets, demonstrating exceptional performance. Moreover, the proposed network achieves a balance between model performance and model complexity.
Problem

Research questions and friction points this paper is trying to address.

Addresses high computational costs in medical image segmentation
Improves extraction of local detailed information from images
Captures multi-scale long-range dependencies for organ segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LamFormer uses Linear Attention Mamba for dependencies
PHFA module aggregates multi-layer features to reduce gaps
Reduced Transformer globally models up-sampled features efficiently
🔎 Similar Papers
No similar papers found.
D
Dayu Tan
Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei 230601, China
C
Cheng Kong
Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei 230601, China
Y
Yansen Su
Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei 230601, China
Hai Chen
Hai Chen
Tsinghua University
robust 3D visionrecommendation systems
D
Dongliang Yang
Second Department of Thoracic Surgery, Anhui Chest Hospital, Hefei 230022, China
Junfeng Xia
Junfeng Xia
Anhui University
Bioinformatics
C
Chunhou Zheng
Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei 230601, China