When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation

πŸ“… 2025-11-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Medical image segmentation faces dual challenges: complex anatomical structures and scarce annotated dataβ€”CNNs struggle with long-range dependencies, while Vision Transformers (ViTs) offer global modeling capability at high data and computational cost. To address this, we propose UKAST, a U-shaped architecture integrating Group Rational KANs with a Swin Transformer encoder. Crucially, we replace the standard Transformer feed-forward network with a novel Group Rational KAN module, preserving windowed self-attention while substantially enhancing model expressivity and data efficiency. This design reduces computational overhead significantly and incurs minimal parameter growth, thereby alleviating ViT’s dependency on large-scale labeled datasets. Evaluated on four 2D and 3D medical segmentation benchmarks, UKAST consistently outperforms both CNN- and Transformer-based baselines, demonstrating particularly strong gains in low-data regimes and achieving state-of-the-art performance.

Technology Category

Application Category

πŸ“ Abstract
Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST
Problem

Research questions and friction points this paper is trying to address.

Improving medical image segmentation with limited annotated data
Reducing computational cost while maintaining Transformer performance
Enhancing global context modeling without increasing parameter count
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates KANs into Swin Transformer encoders
Uses rational base functions for improved efficiency
Achieves high accuracy with fewer FLOPs
πŸ”Ž Similar Papers
No similar papers found.
Nishchal Sapkota
Nishchal Sapkota
Ph.D Candidate, University of Notre Dame
Computer VisionDeep LearningSelf-supervised LearningAI for HealthcareMathematical Modeling
H
Haoyan Shi
Department of Computer Science and Engineering, University of Notre Dame, IN 46556, USA
Y
Yejia Zhang
Department of Computer Science and Engineering, University of Notre Dame, IN 46556, USA
X
Xianshi Ma
Department of Computer Science and Engineering, University of Notre Dame, IN 46556, USA
B
Bofang Zheng
Department of Computer Science and Engineering, University of Notre Dame, IN 46556, USA
D
D. Z. Chen
Department of Computer Science and Engineering, University of Notre Dame, IN 46556, USA