🤖 AI Summary
Addressing the challenge of jointly optimizing accuracy, efficiency, and annotation cost in histopathological tissue segmentation, this paper proposes the Unitized Tile-based Segmentation (UTS) framework. UTS employs 32×32-pixel tiles as atomic classification units—eliminating expensive pixel-level annotations and substantially reducing both manual annotation burden and computational overhead. To capture discriminative morphological details and global tissue context, we introduce a novel multi-level Vision Transformer (L-ViT) architecture with hierarchical feature fusion. Evaluated on a large-scale H&E-stained dataset comprising 459 regions and 386,371 tiles, UTS achieves superior performance over U-Net variants and state-of-the-art ViT-based baselines in three-class breast tissue segmentation (tumor/stroma/other). The framework enables robust tumor–stroma quantification and surgical margin assessment, demonstrating strong clinical applicability.
📝 Abstract
We propose UTS, a unit-based tissue segmentation framework for histopathology that classifies each fixed-size 32 * 32 tile, rather than each pixel, as the segmentation unit. This approach reduces annotation effort and improves computational efficiency without compromising accuracy. To implement this approach, we introduce a Multi-Level Vision Transformer (L-ViT), which benefits the multi-level feature representation to capture both fine-grained morphology and global tissue context. Trained to segment breast tissue into three categories (infiltrating tumor, non-neoplastic stroma, and fat), UTS supports clinically relevant tasks such as tumor-stroma quantification and surgical margin assessment. Evaluated on 386,371 tiles from 459 H&E-stained regions, it outperforms U-Net variants and transformer-based baselines. Code and Dataset will be available at GitHub.