TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and redundancy in uniform regions inherent in 3D medical image segmentation by proposing a boundary-aware sparse visual token representation framework. The method employs a multi-scale hierarchical encoder to generate candidate tokens, followed by a boundary-aware selection mechanism that leverages VQ-VAE and importance scoring to retain critical boundary tokens. A sparse-to-dense decoder then reconstructs high-resolution segmentation masks through token reprojection and progressive upsampling. This approach significantly enhances lesion boundary modeling, achieving 94.49% Dice and 89.61% IoU on breast DCE-MRI data while reducing GPU memory consumption by 64% and inference latency by 68%. Strong generalization is further demonstrated on the MSD cardiac and brain MRI datasets.

Technology Category

Application Category

📝 Abstract
Three-dimensional medical image segmentation is a fundamental yet computationally demanding task due to the cubic growth of voxel processing and the redundant computation on homogeneous regions. To address these limitations, we propose \textbf{TokenSeg}, a boundary-aware sparse token representation framework for efficient 3D medical volume segmentation. Specifically, (1) we design a \emph{multi-scale hierarchical encoder} that extracts 400 candidate tokens across four resolution levels to capture both global anatomical context and fine boundary details; (2) we introduce a \emph{boundary-aware tokenizer} that combines VQ-VAE quantization with importance scoring to select 100 salient tokens, over 60\% of which lie near tumor boundaries; and (3) we develop a \emph{sparse-to-dense decoder} that reconstructs full-resolution masks through token reprojection, progressive upsampling, and skip connections. Extensive experiments on a 3D breast DCE-MRI dataset comprising 960 cases demonstrate that TokenSeg achieves state-of-the-art performance with 94.49\% Dice and 89.61\% IoU, while reducing GPU memory and inference latency by 64\% and 68\%, respectively. To verify the generalization capability, our evaluations on MSD cardiac and brain MRI benchmark datasets demonstrate that TokenSeg consistently delivers optimal performance across heterogeneous anatomical structures. These results highlight the effectiveness of anatomically informed sparse representation for accurate and efficient 3D medical image segmentation.
Problem

Research questions and friction points this paper is trying to address.

3D medical image segmentation
computational efficiency
voxel redundancy
homogeneous regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical visual token compression
boundary-aware tokenizer
sparse-to-dense decoding
efficient 3D segmentation
VQ-VAE quantization
🔎 Similar Papers
No similar papers found.
S
Sen Zeng
Tsinghua University
H
Hong Zhou
Southwest Forestry University
Z
Zheng Zhu
GigaAI
Yang Liu
Yang Liu
King's College London
Computer VisionMedical Image