🤖 AI Summary
This paper studies the heterogeneous federated stochastic approximation problem under Markovian sampling: $M$ agents each hold a nonlinear local operator, and the goal is to compute the root of their population-averaged operator via intermittent server communication. Existing methods rely on projection steps to ensure boundedness and fail to guarantee convergence or collaboration gains under simultaneous heterogeneity and Markovian sampling challenges. We propose FedHSA—a novel algorithm integrating multi-step local updates, bias correction, and refined Markovian error analysis. FedHSA is the first method to establish exact convergence without projections and derives the first tight finite-time convergence bound. Theoretically, collaboration yields an $M$-fold linear speedup in sample complexity. Our results directly apply to heterogeneous federated reinforcement learning, including policy evaluation and control.
📝 Abstract
Motivated by collaborative reinforcement learning (RL) and optimization with time-correlated data, we study a generic federated stochastic approximation problem involving $M$ agents, where each agent is characterized by an agent-specific (potentially nonlinear) local operator. The goal is for the agents to communicate intermittently via a server to find the root of the average of the agents' local operators. The generality of our setting stems from allowing for (i) Markovian data at each agent and (ii) heterogeneity in the roots of the agents' local operators. The limited recent work that has accounted for both these features in a federated setting fails to guarantee convergence to the desired point or to show any benefit of collaboration; furthermore, they rely on projection steps in their algorithms to guarantee bounded iterates. Our work overcomes each of these limitations. We develop a novel algorithm titled exttt{FedHSA}, and prove that it guarantees convergence to the correct point, while enjoying an $M$-fold linear speedup in sample-complexity due to collaboration. To our knowledge, emph{this is the first finite-time result of its kind}, and establishing it (without relying on a projection step) entails a fairly intricate argument that accounts for the interplay between complex temporal correlations due to Markovian sampling, multiple local steps to save communication, and the drift-effects induced by heterogeneous local operators. Our results have implications for a broad class of heterogeneous federated RL problems (e.g., policy evaluation and control) with function approximation, where the agents' Markov decision processes can differ in their probability transition kernels and reward functions.