Model-free policy gradient for discrete-time mean-field control

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the absence of model-free policy gradient methods for mean-field control (MFC) in discrete-time settings with finite state spaces and compact action spaces. The authors propose a novel policy gradient estimation framework based on perturbations of the state distribution flow. By leveraging trajectory simulations and sensitivity analysis of state distributions, they develop a fully model-free MF-REINFORCE algorithm that effectively circumvents the failure of conventional likelihood ratio estimators in the presence of population-state dependencies. Theoretical analysis provides explicit bounds on both the bias and mean-squared error of the gradient estimator. Empirical results demonstrate the algorithm’s effectiveness and scalability across representative MFC tasks.

Technology Category

Application Category

📝 Abstract

We study model-free policy learning for discrete-time mean-field control (MFC) problems with finite state space and compact action space. In contrast to the extensive literature on value-based methods for MFC, policy-based approaches remain largely unexplored due to the intrinsic dependence of transition kernels and rewards on the evolving population state distribution, which prevents the direct use of likelihood-ratio estimators of policy gradients from classical single-agent reinforcement learning. We introduce a novel perturbation scheme on the state-distribution flow and prove that the gradient of the resulting perturbed value function converges to the true policy gradient as the perturbation magnitude vanishes. This construction yields a fully model-free estimator based solely on simulated trajectories and an auxiliary estimate of the sensitivity of the state distribution. Building on this framework, we develop MF-REINFORCE, a model-free policy gradient algorithm for MFC, and establish explicit quantitative bounds on its bias and mean-squared error. Numerical experiments on representative mean-field control tasks demonstrate the effectiveness of the proposed approach.

Problem

Research questions and friction points this paper is trying to address.

mean-field control

model-free policy gradient

policy-based reinforcement learning

state-distribution dependence

discrete-time MFC

Innovation

Methods, ideas, or system contributions that make the work stand out.

model-free policy gradient

mean-field control

state-distribution perturbation

MF-REINFORCE

policy gradient estimation

🔎 Similar Papers

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

2023-09-19Journal of Machine LearningCitations: 5

Authors to Follow