๐ค AI Summary
This work studies the convergence of decentralized stochastic subgradient methods (DSGD-type algorithms) for nonsmooth, nonconvex objective functions that violate Clarke regularityโsuch as neural networks with non-differentiable activations (e.g., ReLU). We propose a unified analytical framework that, for the first time without assuming Clarke regularity, establishes asymptotic convergence guarantees for mainstream variants including DSGD, DSGD-T, and DSGD-M. Our analysis couples discrete iterations with differential inclusions and employs a coercive Lyapunov function to characterize the stable set; under mild regularity conditions and diminishing step sizes, we prove that iterates converge almost surely to this stable set. The framework accommodates both gradient tracking and momentum mechanisms. Numerical experiments on nonsmooth distributed neural network training confirm both theoretical reliability and practical efficiency of the proposed approach.
๐ Abstract
In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). To establish the convergence properties of our proposed framework, we relate the discrete iterates to the trajectories of a continuous-time differential inclusion, which is assumed to have a coercive Lyapunov function with a stable set $mathcal{A}$. We prove the asymptotic convergence of the iterates to the stable set $mathcal{A}$ with sufficiently small and diminishing step-sizes. These results provide first convergence guarantees for some well-recognized of decentralized stochastic subgradient-based methods without Clarke regularity of the objective function. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized stochastic subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.