Uniform Stability and Generalization Error of GD and SGD on Fixed-Point Parameters

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates the generalization error and stability of gradient descent (GD) and stochastic gradient descent (SGD) under deterministic or random rounding in discrete parameter spaces. Leveraging frameworks of uniform stability and parameter uniform stability, and assuming convexity, Lipschitz continuity, and smoothness, the authors derive generalization bounds for both algorithms under rounding operations. Their main contributions include showing that deterministic rounding degrades GD’s generalization error to $O(T/\sqrt{n})$ and undermines its stability; demonstrating that SGD retains nontrivial stability under deterministic rounding, with bounds of $O(T/n)$ in one dimension and $O(T^2/n)$ in higher dimensions; and establishing a tight upper bound on parameter stability for random rounding under coordinate-wise separable losses.

📝 Abstract

We analyze generalization error, uniform stability, and uniform argument stability of gradient descent (GD) and stochastic gradient descent (SGD) over discrete parameter spaces, where each update involves deterministic or stochastic rounding. We show that deterministic rounding degrades the generalization error of GD on convex, Lipschitz, and smooth loss functions, increasing the rate from $O(T/n)$ to $O(T/\sqrt{n})$, and establish matching lower bounds. We further prove that uniform stability of GD becomes $Ω(T)$, showing that stability-based generalization bounds are vacuous in this setting. In contrast, for the same losses, stochastic gradient descent with deterministic rounding admits nontrivial uniform stability guarantees, which differ qualitatively from the real-valued case and exhibit distinct dependencies on the number of iterations and the dimension: we prove tight bounds $O(T/n)$ for one dimension and $O(T^2/n)$ for higher dimensions. We also show that stochastic rounding can introduce generalization error that increases with the dimension; such a phenomenon is absent in standard real-valued optimization and in the deterministic rounding case. Finally, we provide upper bounds on uniform argument stability for stochastic rounding schemes and show that these bounds are tight when the loss can be represented as a sum of coordinate-wise functions.

Problem

Research questions and friction points this paper is trying to address.

generalization error

uniform stability

discrete parameter spaces

gradient descent

stochastic rounding

Innovation

Methods, ideas, or system contributions that make the work stand out.

uniform stability

discrete parameter space

stochastic rounding