🤖 AI Summary
To address the limitation of conventional multiple-instance learning (MIL) in whole-slide image (WSI) classification—namely, its neglect of spatial relationships among patches, leading to inadequate tissue structure modeling—this paper proposes a probabilistic spatial attention mechanism. Specifically, it models inter-patch spatial dependencies via a learnable distance-decay prior; introduces a posterior spatial pruning strategy for data-driven, context-adaptive feature fusion; and designs a multi-head attention diversity loss to mitigate redundant attention. The method preserves linear computational complexity while substantially enhancing spatial representation capability. Evaluated on multiple WSI benchmarks, it achieves state-of-the-art performance, significantly outperforming both non-spatial and mainstream spatial-context baselines. These results underscore the critical role of explicit, learnable, and adaptive spatial priors in improving diagnostic accuracy for digital pathology.
📝 Abstract
Whole Slide Images (WSIs) are high-resolution digital scans widely used in medical diagnostics. WSI classification is typically approached using Multiple Instance Learning (MIL), where the slide is partitioned into tiles treated as interconnected instances. While attention-based MIL methods aim to identify the most informative tiles, they often fail to fully exploit the spatial relationships among them, potentially overlooking intricate tissue structures crucial for accurate diagnosis. To address this limitation, we propose Probabilistic Spatial Attention MIL (PSA-MIL), a novel attention-based MIL framework that integrates spatial context into the attention mechanism through learnable distance-decayed priors, formulated within a probabilistic interpretation of self-attention as a posterior distribution. This formulation enables a dynamic inference of spatial relationships during training, eliminating the need for predefined assumptions often imposed by previous approaches. Additionally, we suggest a spatial pruning strategy for the posterior, effectively reducing self-attention's quadratic complexity. To further enhance spatial modeling, we introduce a diversity loss that encourages variation among attention heads, ensuring each captures distinct spatial representations. Together, PSA-MIL enables a more data-driven and adaptive integration of spatial context, moving beyond predefined constraints. We achieve state-of-the-art performance across both contextual and non-contextual baselines, while significantly reducing computational costs.