🤖 AI Summary
To address the high computational cost and insufficient local detail modeling of Transformers in multi-organ medical image segmentation, this paper proposes LamFormer—a novel U-shaped architecture. Methodologically, it integrates three key innovations: (1) a linear-attention Mamba module to drastically reduce computational complexity for long-range dependency modeling; (2) an enhanced pyramid encoder coupled with a Parallel Hierarchical Feature Aggregation (PHFA) module to bridge semantic gaps across multi-scale features; and (3) a lightweight Reduced Transformer to jointly enhance local detail representation and global contextual awareness. Evaluated on seven mainstream medical imaging benchmarks, LamFormer consistently outperforms existing state-of-the-art methods—achieving superior segmentation accuracy while requiring significantly fewer parameters and lower FLOPs. It thus establishes a new Pareto-optimal trade-off between precision and efficiency in medical image segmentation.
📝 Abstract
In the field of multi-organ medical image segmentation, recent methods frequently employ Transformers to capture long-range dependencies from image features. However, these methods overlook the high computational cost of Transformers and their deficiencies in extracting local detailed information. To address high computational costs and inadequate local detail information, we reassess the design of feature extraction modules and propose a new deep-learning network called LamFormer for fine-grained segmentation tasks across multiple organs. LamFormer is a novel U-shaped network that employs Linear Attention Mamba (LAM) in an enhanced pyramid encoder to capture multi-scale long-range dependencies. We construct the Parallel Hierarchical Feature Aggregation (PHFA) module to aggregate features from different layers of the encoder, narrowing the semantic gap among features while filtering information. Finally, we design the Reduced Transformer (RT), which utilizes a distinct computational approach to globally model up-sampled features. RRT enhances the extraction of detailed local information and improves the network's capability to capture long-range dependencies. LamFormer outperforms existing segmentation methods on seven complex and diverse datasets, demonstrating exceptional performance. Moreover, the proposed network achieves a balance between model performance and model complexity.