Robustness and sex differences in skin cancer detection: logistic regression vs CNNs

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study systematically evaluates the robustness and potential gender bias of AI models for skin cancer diagnosis. Using the PAD-UFES-20 dataset, we compare logistic regression (LR) trained on handcrafted ABCDE/7-point checklist features against fine-tuned ResNet-50 (CNN), assessing performance across training sets with varying gender compositions. We introduce and empirically validate the first cross-gender robustness evaluation framework in dermatology AI. Results demonstrate strong overall robustness for both models; however, the CNN exhibits statistically significant male bias—achieving higher accuracy and AUROC for male patients than female patients (p < 0.01)—whereas LR shows no significant gender disparity. These findings uncover latent gender bias in clinical AI systems and underscore the critical role of medically informed feature engineering in mitigating algorithmic bias. The work establishes a methodological foundation and empirical evidence for developing trustworthy, equitable AI tools in skin cancer diagnosis.

Technology Category

Application Category

📝 Abstract

Deep learning has been reported to achieve high performances in the detection of skin cancer, yet many challenges regarding the reproducibility of results and biases remain. This study is a replication (different data, same analysis) of a study on Alzheimer's disease [28] which studied robustness of logistic regression (LR) and convolutional neural networks (CNN) across patient sexes. We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset with LR trained on handcrafted features reflecting dermatological guidelines (ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We evaluate these models in alignment with [28]: across multiple training datasets with varied sex composition to determine their robustness. Our results show that both the LR and the CNN were robust to the sex distributions, but the results also revealed that the CNN had a significantly higher accuracy (ACC) and area under the receiver operating characteristics (AUROC) for male patients than for female patients. We hope these findings to contribute to the growing field of investigating potential bias in popular medical machine learning methods. The data and relevant scripts to reproduce our results can be found in our Github.

Problem

Research questions and friction points this paper is trying to address.

Investigates sex bias in skin cancer detection accuracy

Compares logistic regression and CNN robustness across sexes

Evaluates model performance using dermatological guidelines and ResNet-50

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used logistic regression with ABCDE features

Applied pre-trained ResNet-50 CNN model

Evaluated robustness across varied sex datasets

🔎 Similar Papers

No similar papers found.