🤖 AI Summary
Whole-slide image (WSI) analysis in digital pathology faces dual challenges: extreme resolution (gigapixel-scale) and scarcity of dense annotations. Conventional patch-level augmentation incurs high computational overhead, while feature-level methods lack semantic controllability. To address this, we propose the first latent-space controllable augmentation framework tailored for WSIs. Our approach employs a conditional generative model explicitly grounded in histopathologically meaningful transformations—such as color tone adjustment and morphological erosion—to enable efficient, semantically consistent, whole-slide augmentation directly in the latent space. Integrated with multi-instance learning (MIL), it eliminates per-patch processing. Evaluated across multi-organ few-shot classification and segmentation tasks, our method significantly outperforms state-of-the-art augmentation strategies, achieving superior accuracy while maintaining high inference efficiency and deployment feasibility.
📝 Abstract
Whole slide image (WSI) analysis in digital pathology presents unique challenges due to the gigapixel resolution of WSIs and the scarcity of dense supervision signals. While Multiple Instance Learning (MIL) is a natural fit for slide-level tasks, training robust models requires large and diverse datasets. Even though image augmentation techniques could be utilized to increase data variability and reduce overfitting, implementing them effectively is not a trivial task. Traditional patch-level augmentation is prohibitively expensive due to the large number of patches extracted from each WSI, and existing feature-level augmentation methods lack control over transformation semantics. We introduce HistAug, a fast and efficient generative model for controllable augmentations in the latent space for digital pathology. By conditioning on explicit patch-level transformations (e.g., hue, erosion), HistAug generates realistic augmented embeddings while preserving initial semantic information. Our method allows the processing of a large number of patches in a single forward pass efficiently, while at the same time consistently improving MIL model performance. Experiments across multiple slide-level tasks and diverse organs show that HistAug outperforms existing methods, particularly in low-data regimes. Ablation studies confirm the benefits of learned transformations over noise-based perturbations and highlight the importance of uniform WSI-wise augmentation. Code is available at https://github.com/MICS-Lab/HistAug.