Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Differential privacy (DP) degrades performance on medical image datasets with few-shot and class-imbalanced settings (e.g., HAM10000), primarily due to excessive gradient clipping suppressing minority-class signals and majority-class dominance leading to suboptimal convergence. To address this, we propose Adaptive-Decay DP-SGD, a method that jointly optimizes the noise scale and gradient clipping threshold via a linear decay schedule—preserving informative minority-class gradients early in training and alleviating the tension between privacy preservation and model convergence. Additionally, we introduce a dynamic privacy budget allocation strategy tailored to class imbalance. Under ε = 3.0 and δ = 10⁻³, our method achieves a 2.15% absolute accuracy gain over Auto-DPSGD on HAM10000, significantly improving the privacy–utility trade-off in imbalanced learning scenarios.

Technology Category

Application Category

📝 Abstract

When applying machine learning to medical image classification, data leakage is a critical issue. Previous methods, such as adding noise to gradients for differential privacy, work well on large datasets like MNIST and CIFAR-100, but fail on small, imbalanced medical datasets like HAM10000. This is because the imbalanced distribution causes gradients from minority classes to be clipped and lose crucial information, while majority classes dominate. This leads the model to fall into suboptimal solutions early. To address this, we propose SAD-DPSGD, which uses a linear decaying mechanism for noise and clipping thresholds. By allocating more privacy budget and using higher clipping thresholds in the initial training phases, the model avoids suboptimal solutions and enhances performance. Experiments show that SAD-DPSGD outperforms Auto-DPSGD on HAM10000, improving accuracy by 2.15% under $ε= 3.0$ , $δ= 10^{-3}$.

Problem

Research questions and friction points this paper is trying to address.

Addressing data leakage in medical image classification

Improving differential privacy on imbalanced datasets like HAM10000

Preventing gradient loss in minority classes during training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear decaying mechanism for noise

Adaptive clipping thresholds allocation

Enhanced privacy budget initial phases

🔎 Similar Papers

No similar papers found.