Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Clipping Bias and Remedies

📅 2024-04-17
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the bias induced by gradient clipping in stochastic gradient descent (SGD) under decision-dependent data distributions—where predictive decisions influence subsequent data generation. We first rigorously quantify the stability bias introduced by clipping, deriving tight upper and lower bounds for both convex and non-convex settings, and revealing that distribution sensitivity significantly amplifies this bias. To eliminate it, we propose DiceSGD, an unbiased correction algorithm whose theoretical analysis guarantees complete removal of clipping bias under practical assumptions. We further derive tight bias bounds and characterize the optimal step-size schedule. Our analysis integrates non-gradient dynamical systems theory, decision-dependent distribution shift modeling, and strong convexity/non-convex optimization frameworks. Extensive numerical experiments validate both the theoretical bias characterization and the efficacy of DiceSGD.

Technology Category

Application Category

📝 Abstract
This paper studies the convergence of clipped stochastic gradient descent (SGD) algorithms with decision-dependent data distribution. Our setting is motivated by privacy preserving optimization algorithms that interact with performative data where the prediction models can influence future outcomes. This challenging setting involves the non-smooth clipping operator and non-gradient dynamics due to distribution shifts. We make two contributions in pursuit for a performative stable solution using clipped SGD algorithms. First, we characterize the clipping bias with projected clipped SGD (PCSGD) algorithm which is caused by the clipping operator that prevents PCSGD from reaching a stable solution. When the loss function is strongly convex, we quantify the lower and upper bounds for this clipping bias and demonstrate a bias amplification phenomenon with the sensitivity of data distribution. When the loss function is non-convex, we bound the magnitude of stationarity bias. Second, we propose remedies to mitigate the bias either by utilizing an optimal step size design for PCSGD, or to apply the recent DiceSGD algorithm [Zhang et al., 2024]. Our analysis is also extended to show that the latter algorithm is free from clipping bias in the performative setting. Numerical experiments verify our findings.
Problem

Research questions and friction points this paper is trying to address.

Clipped Stochastic Gradient Descent
Bias Issue
Data Sensitivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Biased SGD
Privacy-preserving
Optimized Predictive Modeling
🔎 Similar Papers
No similar papers found.