Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In medical image classification, texture cues are critical, yet conventional masked autoencoders (MAEs) suffer from blurry representations due to pixel-level mean squared error (MSE) reconstruction loss. To address this, we propose GLCM-MAE—the first self-supervised MAE framework integrating radiomics-based gray-level co-occurrence matrix (GLCM) features into pretraining. Its core innovation is a differentiable GLCM-matching loss that explicitly enforces modeling of texture and spatial structural information during reconstruction. Built upon the standard MAE architecture, GLCM-MAE replaces pixel-wise reconstruction with matching of GLCM statistics—including contrast, correlation, homogeneity, and entropy. Evaluated on four medical classification tasks, it achieves consistent improvements: +2.1% in gallbladder cancer ultrasound detection, +3.1% in breast cancer ultrasound detection, +0.5% in pneumonia X-ray classification, and +0.6% in COVID-19 CT classification—surpassing state-of-the-art methods across all benchmarks.

Technology Category

Application Category

📝 Abstract
Masked Autoencoders (MAEs) have emerged as a dominant strategy for self-supervised representation learning in natural images, where models are pre-trained to reconstruct masked patches with a pixel-wise mean squared error (MSE) between original and reconstructed RGB values as the loss. We observe that MSE encourages blurred image re-construction, but still works for natural images as it preserves dominant edges. However, in medical imaging, when the texture cues are more important for classification of a visual abnormality, the strategy fails. Taking inspiration from Gray Level Co-occurrence Matrix (GLCM) feature in Radiomics studies, we propose a novel MAE based pre-training framework, GLCM-MAE, using reconstruction loss based on matching GLCM. GLCM captures intensity and spatial relationships in an image, hence proposed loss helps preserve morphological features. Further, we propose a novel formulation to convert matching GLCM matrices into a differentiable loss function. We demonstrate that unsupervised pre-training on medical images with the proposed GLCM loss improves representations for downstream tasks. GLCM-MAE outperforms the current state-of-the-art across four tasks - gallbladder cancer detection from ultrasound images by 2.1%, breast cancer detection from ultrasound by 3.1%, pneumonia detection from x-rays by 0.5%, and COVID detection from CT by 0.6%. Source code and pre-trained models are available at: https://github.com/ChetanMadan/GLCM-MAE.
Problem

Research questions and friction points this paper is trying to address.

Improves medical image classification by focusing on texture cues
Replaces pixel-wise MSE with GLCM-based loss in MAE pre-training
Enhances detection accuracy for various medical conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GLCM-based loss for texture preservation
Converts GLCM matrices to differentiable loss
Improves medical image classification accuracy
🔎 Similar Papers
No similar papers found.
C
Chetan Madan
Indian Institute of Technology, Delhi
A
Aarjav Satia
Indian Institute of Technology, Delhi
S
Soumen Basu
Indian Institute of Technology, Delhi
P
Pankaj Gupta
PGIMER Chandigarh
Usha Dutta
Usha Dutta
PGIMER, Chandigarh
Gallbladder cancerGallstonesNutritionIBD
C
Chetan Arora
Indian Institute of Technology, Delhi