Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In medical image classification, texture cues are critical, yet conventional masked autoencoders (MAEs) suffer from blurry representations due to pixel-level mean squared error (MSE) reconstruction loss. To address this, we propose GLCM-MAE—the first self-supervised MAE framework integrating radiomics-based gray-level co-occurrence matrix (GLCM) features into pretraining. Its core innovation is a differentiable GLCM-matching loss that explicitly enforces modeling of texture and spatial structural information during reconstruction. Built upon the standard MAE architecture, GLCM-MAE replaces pixel-wise reconstruction with matching of GLCM statistics—including contrast, correlation, homogeneity, and entropy. Evaluated on four medical classification tasks, it achieves consistent improvements: +2.1% in gallbladder cancer ultrasound detection, +3.1% in breast cancer ultrasound detection, +0.5% in pneumonia X-ray classification, and +0.6% in COVID-19 CT classification—surpassing state-of-the-art methods across all benchmarks.

Technology Category

Application Category

📝 Abstract

Masked Autoencoders (MAEs) have emerged as a dominant strategy for self-supervised representation learning in natural images, where models are pre-trained to reconstruct masked patches with a pixel-wise mean squared error (MSE) between original and reconstructed RGB values as the loss. We observe that MSE encourages blurred image re-construction, but still works for natural images as it preserves dominant edges. However, in medical imaging, when the texture cues are more important for classification of a visual abnormality, the strategy fails. Taking inspiration from Gray Level Co-occurrence Matrix (GLCM) feature in Radiomics studies, we propose a novel MAE based pre-training framework, GLCM-MAE, using reconstruction loss based on matching GLCM. GLCM captures intensity and spatial relationships in an image, hence proposed loss helps preserve morphological features. Further, we propose a novel formulation to convert matching GLCM matrices into a differentiable loss function. We demonstrate that unsupervised pre-training on medical images with the proposed GLCM loss improves representations for downstream tasks. GLCM-MAE outperforms the current state-of-the-art across four tasks - gallbladder cancer detection from ultrasound images by 2.1%, breast cancer detection from ultrasound by 3.1%, pneumonia detection from x-rays by 0.5%, and COVID detection from CT by 0.6%. Source code and pre-trained models are available at: https://github.com/ChetanMadan/GLCM-MAE.

Problem

Research questions and friction points this paper is trying to address.

Improves medical image classification by focusing on texture cues

Replaces pixel-wise MSE with GLCM-based loss in MAE pre-training

Enhances detection accuracy for various medical conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GLCM-based loss for texture preservation

Converts GLCM matrices to differentiable loss

Improves medical image classification accuracy

🔎 Similar Papers

No similar papers found.

Authors to Follow