Masked strategies for images with small objects

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address the challenge of segmenting and classifying sub-pixel-scale (<5×5 pixels) blood cells in hematological images, this paper proposes a mask-based self-supervised learning framework tailored for small-object perception. The core contribution is the first adaptation of the Masked Autoencoder (MAE) masking strategy specifically for sub-pixel targets, achieved through a synergistic design of “small mask ratios” and “small image patches”—mitigating information loss inherent in conventional MAE due to oversized masks and substantially enhancing the ViT encoder’s capacity to model local-global contextual dependencies. The pre-trained encoder is integrated into a U-Net Transformer architecture for end-to-end semantic segmentation. Experimental results demonstrate significant improvements: +12.6% in segmentation mIoU, +23.4% in small-object detection F1-score, and +9.2 dB in reconstruction PSNR, validating the framework’s efficacy for fine-grained biomedical image analysis.

Technology Category

Application Category

📝 Abstract

The hematology analytics used for detection and classification of small blood components is a significant challenge. In particular, when objects exists as small pixel-sized entities in a large context of similar objects. Deep learning approaches using supervised models with pre-trained weights, such as residual networks and vision transformers have demonstrated success for many applications. Unfortunately, when applied to images outside the domain of learned representations, these methods often result with less than acceptable performance. A strategy to overcome this can be achieved by using self-supervised models, where representations are learned and weights are then applied for downstream applications. Recently, masked autoencoders have proven to be effective to obtain representations that captures global context information. By masking regions of an image and having the model learn to reconstruct both the masked and non-masked regions, weights can be used for various applications. However, if the sizes of the objects in images are less than the size of the mask, the global context information is lost, making it almost impossible to reconstruct the image. In this study, we investigated the effect of mask ratios and patch sizes for blood components using a MAE to obtain learned ViT encoder representations. We then applied the encoder weights to train a U-Net Transformer for semantic segmentation to obtain both local and global contextual information. Our experimental results demonstrates that both smaller mask ratios and patch sizes improve the reconstruction of images using a MAE. We also show the results of semantic segmentation with and without pre-trained weights, where smaller-sized blood components benefited with pre-training. Overall, our proposed method offers an efficient and effective strategy for the segmentation and classification of small objects.

Problem

Research questions and friction points this paper is trying to address.

Detecting small blood components in hematology analytics is challenging

Self-supervised models improve representation learning for small objects

Optimizing mask ratios and patch sizes enhances image reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised masked autoencoders for small objects

Smaller mask ratios and patch sizes improve reconstruction

U-Net Transformer with pre-trained ViT encoder weights

🔎 Similar Papers

No similar papers found.