🤖 AI Summary
To address the scarcity of pixel-level annotations and insufficient representation robustness in semi-supervised medical image segmentation, this paper proposes the first integration of learnable-activation Kolmogorov–Arnold Networks (KANs) into the U-Net encoder-decoder architecture. We introduce a selective KAN embedding strategy that targets the bottleneck and high-level semantic layers; design a dimensionality compression and horizontal expansion mechanism to balance expressiveness and computational efficiency; and construct a multi-branch uncertainty estimation framework to enhance pseudo-label reliability. Evaluated on four public medical imaging benchmarks, our method achieves significant performance gains over state-of-the-art CNN- and ViT-based baselines—despite employing fewer KAN layers and incurring lower computational overhead—thereby demonstrating the effectiveness and generalization superiority of KANs in semi-supervised medical segmentation.
📝 Abstract
Deep learning-based medical image segmentation has shown remarkable success; however, it typically requires extensive pixel-level annotations, which are both expensive and time-intensive. Semi-supervised medical image segmentation (SSMIS) offers a viable alternative, driven by advancements in CNNs and ViTs. However, these networks often rely on single fixed activation functions and linear modeling patterns, limiting their ability to effectively learn robust representations. Given the limited availability of labeled date, achieving robust representation learning becomes crucial. Inspired by Kolmogorov-Arnold Networks (KANs), we propose Semi-KAN, which leverages the untapped potential of KANs to enhance backbone architectures for representation learning in SSMIS. Our findings indicate that: (1) compared to networks with fixed activation functions, KANs exhibit superior representation learning capabilities with fewer parameters, and (2) KANs excel in high-semantic feature spaces. Building on these insights, we integrate KANs into tokenized intermediate representations, applying them selectively at the encoder's bottleneck and the decoder's top layers within a U-Net pipeline to extract high-level semantic features. Although learnable activation functions improve feature expansion, they introduce significant computational overhead with only marginal performance gains. To mitigate this, we reduce the feature dimensions and employ horizontal scaling to capture multiple pattern representations. Furthermore, we design a multi-branch U-Net architecture with uncertainty estimation to effectively learn diverse pattern representations. Extensive experiments on four public datasets demonstrate that Semi-KAN surpasses baseline networks, utilizing fewer KAN layers and lower computational cost, thereby underscoring the potential of KANs as a promising approach for SSMIS.