When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Medical image segmentation faces dual challenges: complex anatomical structures and scarce annotated data—CNNs struggle with long-range dependencies, while Vision Transformers (ViTs) offer global modeling capability at high data and computational cost. To address this, we propose UKAST, a U-shaped architecture integrating Group Rational KANs with a Swin Transformer encoder. Crucially, we replace the standard Transformer feed-forward network with a novel Group Rational KAN module, preserving windowed self-attention while substantially enhancing model expressivity and data efficiency. This design reduces computational overhead significantly and incurs minimal parameter growth, thereby alleviating ViT’s dependency on large-scale labeled datasets. Evaluated on four 2D and 3D medical segmentation benchmarks, UKAST consistently outperforms both CNN- and Transformer-based baselines, demonstrating particularly strong gains in low-data regimes and achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST

Problem

Research questions and friction points this paper is trying to address.

Improving medical image segmentation with limited annotated data

Reducing computational cost while maintaining Transformer performance

Enhancing global context modeling without increasing parameter count

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates KANs into Swin Transformer encoders

Uses rational base functions for improved efficiency

Achieves high accuracy with fewer FLOPs

🔎 Similar Papers

No similar papers found.

Authors to Follow