🤖 AI Summary
To address the challenge of jointly modeling long-range dependencies and maintaining computational efficiency in 3D medical image segmentation, this paper proposes UNetVL—a novel U-Net-based architecture that innovatively integrates Vision-LSTM (ViL) into both encoder and decoder pathways to enhance spatiotemporal contextual modeling. Furthermore, it introduces, for the first time, lightweight Chebyshev Kolmogorov–Arnold Networks (KANs) as replacements for conventional MLPs along the encoder-decoder skip connections, enabling precise capture of complex global patterns with significantly reduced parameter count. This design strengthens long-range contextual awareness while improving parameter efficiency. Evaluated on ACDC and AMOS2022 benchmarks, UNetVL achieves average Dice scores 7.3% and 15.6% higher than UNETR, respectively, outperforming state-of-the-art methods. The implementation is publicly available.
📝 Abstract
3D medical image segmentation has progressed considerably due to Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), yet these methods struggle to balance long-range dependency acquisition with computational efficiency. To address this challenge, we propose UNETVL (U-Net Vision-LSTM), a novel architecture that leverages recent advancements in temporal information processing. UNETVL incorporates Vision-LSTM (ViL) for improved scalability and memory functions, alongside an efficient Chebyshev Kolmogorov-Arnold Networks (KAN) to handle complex and long-range dependency patterns more effectively. We validated our method on the ACDC and AMOS2022 (post challenge Task 2) benchmark datasets, showing a significant improvement in mean Dice score compared to recent state-of-the-art approaches, especially over its predecessor, UNETR, with increases of 7.3% on ACDC and 15.6% on AMOS, respectively. Extensive ablation studies were conducted to demonstrate the impact of each component in UNETVL, providing a comprehensive understanding of its architecture. Our code is available at https://github.com/tgrex6/UNETVL, facilitating further research and applications in this domain.