π€ AI Summary
Federated Reinforcement Learning (FedRL) faces significant challenges in enabling collaboration among heterogeneous agents under black-box settings, with no shared data, limited local samples, and constrained model expressivity.
Method: This paper proposes the first federated policy distillation framework that uses action probability distributions as the knowledge carrier. It integrates policy distillation, federated optimization, and reinforcement learning to enable distribution-level knowledge transfer across agents.
Contribution/Results: The framework establishes the first theoretical convergence guarantee for FedRL under both non-i.i.d. agent policies and the absence of any public or shared dataset. Empirical evaluation across multiple RL benchmarks demonstrates substantial improvements in convergence speed and generalization performance, without requiring manually constructed public data. Results validate the methodβs effectiveness, robustness, and practical feasibility in realistic federated RL scenarios.
π Abstract
Federated Reinforcement Learning (FedRL) improves sample efficiency while preserving privacy; however, most existing studies assume homogeneous agents, limiting its applicability in real-world scenarios. This paper investigates FedRL in black-box settings with heterogeneous agents, where each agent employs distinct policy networks and training configurations without disclosing their internal details. Knowledge Distillation (KD) is a promising method for facilitating knowledge sharing among heterogeneous models, but it faces challenges related to the scarcity of public datasets and limitations in knowledge representation when applied to FedRL. To address these challenges, we propose Federated Heterogeneous Policy Distillation (FedHPD), which solves the problem of heterogeneous FedRL by utilizing action probability distributions as a medium for knowledge sharing. We provide a theoretical analysis of FedHPD's convergence under standard assumptions. Extensive experiments corroborate that FedHPD shows significant improvements across various reinforcement learning benchmark tasks, further validating our theoretical findings. Moreover, additional experiments demonstrate that FedHPD operates effectively without the need for an elaborate selection of public datasets.