Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Skin lesion classification faces challenges including variable imaging conditions and the absence of phenotypic and clinical contextual information, limiting unimodal image models’ ability to perform holistic risk assessment required in clinical decision-making. To address this, we propose SLIMP, a nested multi-granularity contrastive learning framework that jointly models lesion images, lesion-level metadata (e.g., anatomical location, size), and patient-level electronic health records (e.g., medical history, family history). SLIMP employs cross-modal embedding alignment and hierarchical representation pretraining to enable deep synergy among heterogeneous multimodal data. Evaluated on multiple skin lesion classification benchmarks, SLIMP consistently outperforms state-of-the-art unimodal and multimodal approaches. The learned representations exhibit enhanced discriminability and clinical interpretability, offering a novel paradigm for real-world, clinically grounded skin cancer辅助 diagnosis.

Technology Category

Application Category

📝 Abstract

We introduce SLIMP (Skin Lesion Image-Metadata Pre-training) for learning rich representations of skin lesions through a novel nested contrastive learning approach that captures complex relationships between images and metadata. Melanoma detection and skin lesion classification based solely on images, pose significant challenges due to large variations in imaging conditions (lighting, color, resolution, distance, etc.) and lack of clinical and phenotypical context. Clinicians typically follow a holistic approach for assessing the risk level of the patient and for deciding which lesions may be malignant and need to be excised, by considering the patient's medical history as well as the appearance of other lesions of the patient. Inspired by this, SLIMP combines the appearance and the metadata of individual skin lesions with patient-level metadata relating to their medical record and other clinically relevant information. By fully exploiting all available data modalities throughout the learning process, the proposed pre-training strategy improves performance compared to other pre-training strategies on downstream skin lesions classification tasks highlighting the learned representations quality.

Problem

Research questions and friction points this paper is trying to address.

Improving skin lesion classification using multi-modal data

Addressing imaging variability in melanoma detection

Enhancing clinical context integration for lesion analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nested multi-modal contrastive learning approach

Combines lesion images with patient metadata

Improves skin lesion classification performance

🔎 Similar Papers

No similar papers found.

Authors to Follow