Non-convex entropic mean-field optimization via Best Response flow

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies KL-divergence-regularized nonconvex optimization over probability measure spaces and its associated nonconvex–nonconcave entropy-regularized minimax problems. We introduce an analytical framework based on the Best Response flow (i.e., fictitious play flow), establishing for the first time a quantitative relationship among nonconvexity, regularization strength, and tail decay of the reference measure. By appropriately tuning the regularization parameter, we ensure that the Best Response operator becomes a contraction mapping under the $L^1$-Wasserstein metric, thereby guaranteeing the existence and global convergence to the unique minimizer. Our theoretical results provide the first nonconvex analysis framework for mean-field policy optimization, Markov decision processes (MDPs), and Markov games with softmax parameterization—simultaneously ensuring convergence guarantees and computational tractability.

Technology Category

Application Category

📝 Abstract

We study the problem of minimizing non-convex functionals on the space of probability measures, regularized by the relative entropy (KL divergence) with respect to a fixed reference measure, as well as the corresponding problem of solving entropy-regularized non-convex-non-concave min-max problems. We utilize the Best Response flow (also known in the literature as the fictitious play flow) and study how its convergence is influenced by the relation between the degree of non-convexity of the functional under consideration, the regularization parameter and the tail behaviour of the reference measure. In particular, we demonstrate how to choose the regularizer, given the non-convex functional, so that the Best Response operator becomes a contraction with respect to the $L^1$-Wasserstein distance, which then ensures the existence of its unique fixed point, which is then shown to be the unique global minimizer for our optimization problem. This extends recent results where the Best Response flow was applied to solve convex optimization problems regularized by the relative entropy with respect to arbitrary reference measures, and with arbitrary values of the regularization parameter. Our results explain precisely how the assumption of convexity can be relaxed, at the expense of making a specific choice of the regularizer. Additionally, we demonstrate how these results can be applied in reinforcement learning in the context of policy optimization for Markov Decision Processes and Markov games with softmax parametrized policies in the mean-field regime.

Problem

Research questions and friction points this paper is trying to address.

Minimizing non-convex functionals on probability measures with KL regularization

Analyzing Best Response flow convergence under non-convexity and regularization

Applying results to reinforcement learning for policy optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Best Response flow for non-convex optimization

KL divergence regularization with reference measure

Contraction in L1-Wasserstein distance ensures uniqueness

🔎 Similar Papers

No similar papers found.

Authors to Follow