Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?

📅 2024-06-24

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

🤖 AI Summary

To address the high computational cost of Vision Transformer (ViT)-based models and the limited global modeling capability of CNN-based approaches in 3D medical image segmentation, this paper proposes U-VixLSTM—a novel architecture that integrates lightweight Vision-xLSTM modules into the UNet encoder-decoder framework. Local features are extracted via CNNs, while xLSTM captures cross-block spatiotemporal and long-range dependencies. Crucially, we introduce patch-wise temporal unfolding and gated state update mechanisms to enhance representational efficiency. Experimental results on Synapse, ISIC, and ACDC datasets demonstrate that U-VixLSTM outperforms state-of-the-art methods: it achieves 1.2–2.8% higher Dice scores, 23% faster inference speed, 37% reduced GPU memory consumption, and significantly lower parameter count and memory footprint. This work establishes a new paradigm for efficient, deployable 3D medical image segmentation.

Technology Category

Application Category

📝 Abstract

The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {it extbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM

Problem

Research questions and friction points this paper is trying to address.

Develop efficient medical image segmentation with hybrid CNN-Vision-xLSTM models

Reduce computational costs while maintaining high segmentation performance

Improve global dependency capture in 3D medical image analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines CNNs with Vision-xLSTM for segmentation

Uses Vision-xLSTM to capture temporal and global relationships

Upsamples features for efficient medical image segmentation

🔎 Similar Papers

No similar papers found.

Authors to Follow