Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited interpretability of deep learning models in melanoma diagnosis—which hinders clinical adoption—this paper proposes a cross-modal interpretable diagnostic framework. Methodologically, it introduces a dual-projection-head contrastive learning mechanism that explicitly aligns clinically relevant dermatoscopic criteria (e.g., asymmetry, border irregularity, color variation) with visual features extracted by a Vision Transformer; it further integrates a large language model to generate structured textual diagnostic reports. The contributions include: (1) achieving high diagnostic accuracy (92.79% accuracy, 0.961 AUC) while ensuring decision transparency; (2) significantly improving multiple quantitative interpretability metrics (e.g., faithfulness, plausibility); and (3) demonstrating strong alignment between model-generated visual attributions and dermatologists’ clinical judgments. This work establishes a new paradigm for AI-assisted dermatological diagnosis that balances robust performance with clinical trustworthiness.

Technology Category

Application Category

📝 Abstract
Deep learning has demonstrated expert-level performance in melanoma classification, positioning it as a powerful tool in clinical dermatology. However, model opacity and the lack of interpretability remain critical barriers to clinical adoption, as clinicians often struggle to trust the decision-making processes of black-box models. To address this gap, we present a Cross-modal Explainable Framework for Melanoma (CEFM) that leverages contrastive learning as the core mechanism for achieving interpretability. Specifically, CEFM maps clinical criteria for melanoma diagnosis-namely Asymmetry, Border, and Color (ABC)-into the Vision Transformer embedding space using dual projection heads, thereby aligning clinical semantics with visual features. The aligned representations are subsequently translated into structured textual explanations via natural language generation, creating a transparent link between raw image data and clinical interpretation. Experiments on public datasets demonstrate 92.79% accuracy and an AUC of 0.961, along with significant improvements across multiple interpretability metrics. Qualitative analyses further show that the spatial arrangement of the learned embeddings aligns with clinicians' application of the ABC rule, effectively bridging the gap between high-performance classification and clinical trust.
Problem

Research questions and friction points this paper is trying to address.

Addresses melanoma diagnosis opacity via contrastive learning
Aligns clinical ABC criteria with visual features for interpretability
Generates structured textual explanations to build clinical trust
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning aligns clinical criteria with visual features
Vision Transformer maps ABC rules into embedding space
LLM generates structured textual explanations for interpretability
🔎 Similar Papers
No similar papers found.
J
Junwen Zheng
Nanyang Technological University, Singapore
X
Xinran Xu
Nanyang Technological University, Singapore
L
Li Rong Wang
Nanyang Technological University, Singapore; Centre for Frontier AI Research, A*STAR, Singapore
Chang Cai
Chang Cai
Nanyang Technological University, Singapore
L
Lucinda Siyun Tan
National Skin Centre, National Healthcare Group, Singapore
D
Dingyuan Wang
National Skin Centre, National Healthcare Group, Singapore
H
Hong Liang Tey
Nanyang Technological University, Singapore; National Skin Centre, National Healthcare Group, Singapore
Xiuyi Fan
Xiuyi Fan
Nanyang Technological University
Artificial Intelligence