On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations

📅 2024-11-22
🏛️ International Conference on Learning Representations
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address poor generalization of global policies in federated reinforcement learning (FedRL) under heterogeneous environments, this paper proposes PFedRL-Rep, a personalized FedRL framework that jointly learns a globally shared representation and locally adapted policy weights. Methodologically, it establishes, for the first time in personalized FedRL, a theoretical connection between two-timescale stochastic approximation and Markovian noise, and proves that the convergence rate exhibits linear speedup with respect to the number of agents. Technically, the framework integrates temporal-difference learning, linear representation learning, deep Q-networks (DQNs), and federated optimization. Experiments demonstrate that PFedRL-Rep significantly improves learning performance and cross-environment generalization in heterogeneous settings, particularly achieving superior performance in unseen environments.

Technology Category

Application Category

📝 Abstract
Federated reinforcement learning (FedRL) enables multiple agents to collaboratively learn a policy without sharing their local trajectories collected during agent-environment interactions. However, in practice, the environments faced by different agents are often heterogeneous, leading to poor performance by the single policy learned by existing FedRL algorithms on individual agents. In this paper, we take a further step and introduce a emph{personalized} FedRL framework (PFedRL) by taking advantage of possibly shared common structure among agents in heterogeneous environments. Specifically, we develop a class of PFedRL algorithms named PFedRL-Rep that learns (1) a shared feature representation collaboratively among all agents, and (2) an agent-specific weight vector personalized to its local environment. We analyze the convergence of PFedTD-Rep, a particular instance of the framework with temporal difference (TD) learning and linear representations. To the best of our knowledge, we are the first to prove a linear convergence speedup with respect to the number of agents in the PFedRL setting. To achieve this, we show that PFedTD-Rep is an example of the federated two-timescale stochastic approximation with Markovian noise. Experimental results demonstrate that PFedTD-Rep, along with an extension to the control setting based on deep Q-networks (DQN), not only improve learning in heterogeneous settings, but also provide better generalization to new environments.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance issues in FedRL due to heterogeneous agent environments
Proposes personalized FedRL framework with shared representations and agent-specific weights
Proves linear convergence speedup in PFedRL with shared feature learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Personalized FedRL framework with shared representations
Shared feature representation and agent-specific weights
Linear convergence speedup proven for PFedTD-Rep
🔎 Similar Papers
No similar papers found.