Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning

๐Ÿ“… 2025-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Skin lesion classification faces challenges including variable imaging conditions and the absence of phenotypic and clinical contextual information, limiting unimodal image modelsโ€™ ability to perform holistic risk assessment required in clinical decision-making. To address this, we propose SLIMP, a nested multi-granularity contrastive learning framework that jointly models lesion images, lesion-level metadata (e.g., anatomical location, size), and patient-level electronic health records (e.g., medical history, family history). SLIMP employs cross-modal embedding alignment and hierarchical representation pretraining to enable deep synergy among heterogeneous multimodal data. Evaluated on multiple skin lesion classification benchmarks, SLIMP consistently outperforms state-of-the-art unimodal and multimodal approaches. The learned representations exhibit enhanced discriminability and clinical interpretability, offering a novel paradigm for real-world, clinically grounded skin cancer่พ…ๅŠฉ diagnosis.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce SLIMP (Skin Lesion Image-Metadata Pre-training) for learning rich representations of skin lesions through a novel nested contrastive learning approach that captures complex relationships between images and metadata. Melanoma detection and skin lesion classification based solely on images, pose significant challenges due to large variations in imaging conditions (lighting, color, resolution, distance, etc.) and lack of clinical and phenotypical context. Clinicians typically follow a holistic approach for assessing the risk level of the patient and for deciding which lesions may be malignant and need to be excised, by considering the patient's medical history as well as the appearance of other lesions of the patient. Inspired by this, SLIMP combines the appearance and the metadata of individual skin lesions with patient-level metadata relating to their medical record and other clinically relevant information. By fully exploiting all available data modalities throughout the learning process, the proposed pre-training strategy improves performance compared to other pre-training strategies on downstream skin lesions classification tasks highlighting the learned representations quality.
Problem

Research questions and friction points this paper is trying to address.

Improving skin lesion classification using multi-modal data
Addressing imaging variability in melanoma detection
Enhancing clinical context integration for lesion analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nested multi-modal contrastive learning approach
Combines lesion images with patient metadata
Improves skin lesion classification performance
๐Ÿ”Ž Similar Papers
No similar papers found.
D
D. Christopoulos
Remote Sensing Lab, National Technical University of Athens, Athens, Greece
S
Sotiris Spanos
Remote Sensing Lab, National Technical University of Athens, Athens, Greece
E
Eirini Baltzi
Remote Sensing Lab, National Technical University of Athens, Athens, Greece
Valsamis Ntouskos
Valsamis Ntouskos
Universitas Mercatorum, National Technical University of Athens
Computer VisionPattern RecognitionRobotics
Konstantinos Karantzalos
Konstantinos Karantzalos
Remote Sensing Lab., National Technical University of Athens