π€ AI Summary
Medical image segmentation faces dual challenges: complex anatomical structures and scarce annotated dataβCNNs struggle with long-range dependencies, while Vision Transformers (ViTs) offer global modeling capability at high data and computational cost. To address this, we propose UKAST, a U-shaped architecture integrating Group Rational KANs with a Swin Transformer encoder. Crucially, we replace the standard Transformer feed-forward network with a novel Group Rational KAN module, preserving windowed self-attention while substantially enhancing model expressivity and data efficiency. This design reduces computational overhead significantly and incurs minimal parameter growth, thereby alleviating ViTβs dependency on large-scale labeled datasets. Evaluated on four 2D and 3D medical segmentation benchmarks, UKAST consistently outperforms both CNN- and Transformer-based baselines, demonstrating particularly strong gains in low-data regimes and achieving state-of-the-art performance.
π Abstract
Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST