Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

📅 2023-10-18
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
This paper addresses Lipschitz continuous, nonsmooth, nonconvex stochastic optimization in decentralized networks—without requiring gradient information. We propose two zeroth-order distributed algorithms: DGFM and its enhanced variant DGFM+. DGFM+ is the first method to integrate randomized smoothing, gradient tracking, and variance reduction in a decentralized zeroth-order setting, incorporating a novel double-batch sampling scheme that improves the convergence complexity to $O(d^{3/2}delta^{-1}varepsilon^{-3})$. Theoretically, both algorithms are proven to converge to an $(delta,varepsilon)$-Goldstein stationary point. The framework supports flexible oracle queries—including single-sample, mini-batch, and periodic large-batch evaluations. Empirical results on real-world datasets demonstrate that DGFM+ significantly outperforms existing decentralized zeroth-order methods in terms of both convergence speed and solution quality.
📝 Abstract
We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method+ (DGFM+). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of O(d^(3/2)δ^(-1)ε^(-4)) for obtaining a (δ,ε)-Goldstein stationary point. DGFM+, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to O(d^(3/2)δ^(-1)ε^(-3)). Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.
Problem

Research questions and friction points this paper is trying to address.

Decentralized optimization for non-smooth non-convex functions
Minimizing Lipschitz continuous functions without gradients
Achieving Goldstein stationarity in stochastic settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized gradient-free optimization without smoothness or convexity
Randomized smoothing and gradient tracking techniques
Variance reduction with mini-batching for improved convergence
🔎 Similar Papers
No similar papers found.
Z
Zhenwei Lin
Shanghai University of Finance and Economics
Jingfan Xia
Jingfan Xia
Shanghai University of Finance and Economics
Q
Qi Deng
Shanghai University of Finance and Economics
Luo Luo
Luo Luo
Fudan University
Machine LearningOptimizationLinear Algebra.