Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

📅 2023-10-18

🏛️ AAAI Conference on Artificial Intelligence

📈 Citations: 2

✨ Influential: 1

career value

237K/year

🤖 AI Summary

This paper addresses Lipschitz continuous, nonsmooth, nonconvex stochastic optimization in decentralized networks—without requiring gradient information. We propose two zeroth-order distributed algorithms: DGFM and its enhanced variant DGFM+. DGFM+ is the first method to integrate randomized smoothing, gradient tracking, and variance reduction in a decentralized zeroth-order setting, incorporating a novel double-batch sampling scheme that improves the convergence complexity to $O(d^{3/2}delta^{-1}varepsilon^{-3})$. Theoretically, both algorithms are proven to converge to an $(delta,varepsilon)$-Goldstein stationary point. The framework supports flexible oracle queries—including single-sample, mini-batch, and periodic large-batch evaluations. Empirical results on real-world datasets demonstrate that DGFM+ significantly outperforms existing decentralized zeroth-order methods in terms of both convergence speed and solution quality.

📝 Abstract

We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method+ (DGFM+). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of O(d^(3/2)δ^(-1)ε^(-4)) for obtaining a (δ,ε)-Goldstein stationary point. DGFM+, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to O(d^(3/2)δ^(-1)ε^(-3)). Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Decentralized optimization for non-smooth non-convex functions

Minimizing Lipschitz continuous functions without gradients

Achieving Goldstein stationarity in stochastic settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized gradient-free optimization without smoothness or convexity

Randomized smoothing and gradient tracking techniques

Variance reduction with mini-batching for improved convergence

🔎 Similar Papers

Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions