Robustness and sex differences in skin cancer detection: logistic regression vs CNNs

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the robustness and potential gender bias of AI models for skin cancer diagnosis. Using the PAD-UFES-20 dataset, we compare logistic regression (LR) trained on handcrafted ABCDE/7-point checklist features against fine-tuned ResNet-50 (CNN), assessing performance across training sets with varying gender compositions. We introduce and empirically validate the first cross-gender robustness evaluation framework in dermatology AI. Results demonstrate strong overall robustness for both models; however, the CNN exhibits statistically significant male bias—achieving higher accuracy and AUROC for male patients than female patients (p < 0.01)—whereas LR shows no significant gender disparity. These findings uncover latent gender bias in clinical AI systems and underscore the critical role of medically informed feature engineering in mitigating algorithmic bias. The work establishes a methodological foundation and empirical evidence for developing trustworthy, equitable AI tools in skin cancer diagnosis.

Technology Category

Application Category

📝 Abstract
Deep learning has been reported to achieve high performances in the detection of skin cancer, yet many challenges regarding the reproducibility of results and biases remain. This study is a replication (different data, same analysis) of a study on Alzheimer's disease [28] which studied robustness of logistic regression (LR) and convolutional neural networks (CNN) across patient sexes. We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset with LR trained on handcrafted features reflecting dermatological guidelines (ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We evaluate these models in alignment with [28]: across multiple training datasets with varied sex composition to determine their robustness. Our results show that both the LR and the CNN were robust to the sex distributions, but the results also revealed that the CNN had a significantly higher accuracy (ACC) and area under the receiver operating characteristics (AUROC) for male patients than for female patients. We hope these findings to contribute to the growing field of investigating potential bias in popular medical machine learning methods. The data and relevant scripts to reproduce our results can be found in our Github.
Problem

Research questions and friction points this paper is trying to address.

Investigates sex bias in skin cancer detection accuracy
Compares logistic regression and CNN robustness across sexes
Evaluates model performance using dermatological guidelines and ResNet-50
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used logistic regression with ABCDE features
Applied pre-trained ResNet-50 CNN model
Evaluated robustness across varied sex datasets
🔎 Similar Papers
No similar papers found.
N
Nikolette Pedersen
IT University of Copenhagen, Denmark
R
Regitze Sydendal
IT University of Copenhagen, Denmark
A
Andreas Wulff
IT University of Copenhagen, Denmark
R
Ralf Raumanns
IT University of Copenhagen, Denmark
E
Eike Petersen
Fraunhofer Institute for Digital Medicine MEVIS, Germany
Veronika Cheplygina
Veronika Cheplygina
IT University Copenhagen
meta-researchpattern recognitionmachine learningmedical imagingopen science