🤖 AI Summary
Spitz tumors exhibit high histological heterogeneity, posing diagnostic challenges in differentiating them from conventional melanomas; moreover, reliable tools for predicting genetic alterations and clinical behavior remain lacking. To address this, we developed the first multitask AI model leveraging UNI-derived histopathological features to simultaneously: (i) distinguish Spitz tumors from melanomas, (ii) predict key genomic alterations (BRAF, NRAS, BAP1), and (iii) classify Spitz tumors into benign, atypical, or malignant subtypes. On an independent test cohort, the model achieved an AUROC of 0.95 for melanoma differentiation, and accuracies of 0.55 and 0.51 for genomic prediction and subtype classification—both significantly exceeding chance levels and outperforming expert pathologists. Reader studies and workflow simulations demonstrated improved inter-observer concordance, reduced reporting turnaround time, and lower ancillary testing costs. This work establishes the first clinically deployable AI framework for precise histopathologic subtyping of diagnostically challenging cutaneous tumors.
📝 Abstract
Spitz tumors are diagnostically challenging due to overlap in atypical histological features with conventional melanomas. We investigated to what extent AI models, using histological and/or clinical features, can: (1) distinguish Spitz tumors from conventional melanomas; (2) predict the underlying genetic aberration of Spitz tumors; and (3) predict the diagnostic category of Spitz tumors. The AI models were developed and validated using a dataset of 393 Spitz tumors and 379 conventional melanomas. Predictive performance was measured using the AUROC and the accuracy. The performance of the AI models was compared with that of four experienced pathologists in a reader study. Moreover, a simulation experiment was conducted to investigate the impact of implementing AI-based recommendations for ancillary diagnostic testing on the workflow of the pathology department. The best AI model based on UNI features reached an AUROC of 0.95 and an accuracy of 0.86 in differentiating Spitz tumors from conventional melanomas. The genetic aberration was predicted with an accuracy of 0.55 compared to 0.25 for randomly guessing. The diagnostic category was predicted with an accuracy of 0.51, where random chance-level accuracy equaled 0.33. On all three tasks, the AI models performed better than the four pathologists, although differences were not statistically significant for most individual comparisons. Based on the simulation experiment, implementing AI-based recommendations for ancillary diagnostic testing could reduce material costs, turnaround times, and examinations. In conclusion, the AI models achieved a strong predictive performance in distinguishing between Spitz tumors and conventional melanomas. On the more challenging tasks of predicting the genetic aberration and the diagnostic category of Spitz tumors, the AI models performed better than random chance.