π€ AI Summary
To address the inefficiency and global context loss inherent in sliding-window inference for 3D medical image segmentation, this paper proposes NMSW-Netβa window-free, end-to-end framework. Its core innovation is a differentiable Top-k patch sampling mechanism that dynamically focuses computation on salient regions. Coupled with multi-scale feature distillation and global coarse prediction guidance, the method enables synergistic global-local modeling. Model-agnostic and plug-and-play, NMSW-Net preserves or improves segmentation accuracy while drastically reducing computational cost: FLOPs drop by 90% (from 87.5 to 7.95 TFLOPS); inference accelerates 4Γ on an H100 GPU (19.0 s β 4.3 s) and 7Γ on CPU (1710 s β 230 s). These gains significantly advance real-time, high-precision 3D medical image segmentation.
π Abstract
3D models are favored over 2D for 3D medical image segmentation tasks due to their ability to leverage inter-slice relationship, yielding higher segmentation accuracy. However, 3D models demand significantly more GPU memory with increased model size and intermediate tensors. A common solution is to use patch-based training and make whole-volume predictions with sliding window (SW) inference. SW inference reduces memory usage but is slower due to equal resource allocation across patches and less accurate as it overlooks global features beyond patches. We propose NMSW-Net (No-More-Sliding-Window-Net), a novel framework that enhances efficiency and accuracy of any given 3D segmentation model by eliminating SW inference and incorporating global predictions when necessary. NMSW-Net incorporates a differentiable Top-k module to sample only the relevant patches that enhance segmentation accuracy, thereby minimizing redundant computations. Additionally, it learns to leverage coarse global predictions when patch prediction alone is insufficient. NMSW-Net is model-agnostic, making it compatible with any 3D segmentation model that previously relied on SW inference. Evaluated across 3 tasks with 3 segmentation backbones, NMSW-Net achieves competitive or sometimes superior accuracy compared to SW, while reducing computational complexity by 90% (87.5 to 7.95 TFLOPS), delivering 4x faster inference on the H100 GPU (19.0 to 4.3 sec), and 7x faster inference on the Intel Xeon Gold CPU (1710 to 230 seconds).