🤖 AI Summary
This work addresses the limitations of fixed clipping thresholds in differentially private stochastic gradient descent (DP-SGD), which constrain model utility, and the lack of synergistic optimization between existing adaptive clipping strategies and momentum mechanisms. The paper proposes DP-MacAdam, an algorithm that, for the first time, jointly designs adaptive clipping and Adam-like momentum updates based on unified unbiased estimates of gradient mean and variance. This approach achieves a balance between strong privacy guarantees and convergence without requiring manual hyperparameter tuning. Empirical results demonstrate that, under the same privacy budget, DP-MacAdam significantly outperforms DP-SGD, AdaClip, and DP-Adam in terms of both model accuracy and training efficiency.
📝 Abstract
Differentially private stochastic gradient descent (DP-SGD) has become the standard framework for privacy-preserving machine learning, yet its reliance on a fixed gradient clipping threshold to limit sensitivity remains a significant practical limitation. Adaptive clipping algorithms such as AdaClip shift and scale the gradient prior to clipping and adding noise so that the clipped gradient yields a more informative descent direction. The shift and scaling parameters are selected adaptively based on the empirical mean and variance. However, in existing adaptive clipping algorithms, these empirical estimates have not been also used for momentum to accelerate training itself. On the other hand, DP-Adam is an algorithm that exploits Adam-like momentum updates based on the gradient mean and variance to accelerate training, but does not exploit these estimates for adaptive clipping. In this work, we propose Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum (DP-MacAdam), a novel algorithm that combines these two approaches so as to use the same mean and variance estimates for both clipping and momentum. We perform an analysis showing that DP-MacAdam estimates the gradient variances in a bias-free manner. In addition, we empirically evaluate the privacy and accuracy of DP-MacAdam, demonstrating that it achieves improved model utility compared to DP-SGD, AdaClip, and DP-Adam baselines, without requiring manual tuning of the clipping threshold.