Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low approximation accuracy and training instability of Q-function estimation in reinforcement learning. We propose directly parameterizing the Q-function using a Gaussian Mixture Model (GMM), treating it as a differentiable surrogate for the Bellman residual to enable end-to-end policy iteration without experience replay. To our knowledge, this is the first work to establish GMM as a universal Q-function approximator. We further introduce Riemannian manifold optimization to naturally enforce positive definiteness of covariance matrices, thereby improving training stability. We provide theoretical guarantees on generalization error bounds and convergence. Empirically, our method matches or surpasses state-of-the-art performance across multiple benchmark tasks, while incurring significantly lower computational overhead than DQN and eliminating reliance on experience sampling or replay buffers.

Technology Category

Application Category

📝 Abstract
Unlike their conventional use as estimators of probability density functions in reinforcement learning (RL), this paper introduces a novel function-approximation role for Gaussian mixture models (GMMs) as direct surrogates for Q-function losses. These parametric models, termed GMM-QFs, possess substantial representational capacity, as they are shown to be universal approximators over a broad class of functions. They are further embedded within Bellman residuals, where their learnable parameters -- a fixed number of mixing weights, together with Gaussian mean vectors and covariance matrices -- are inferred from data via optimization on a Riemannian manifold. This geometric perspective on the parameter space naturally incorporates Riemannian optimization into the policy-evaluation step of standard policy-iteration frameworks. Rigorous theoretical results are established, and supporting numerical tests show that, even without access to experience data, GMM-QFs deliver competitive performance and, in some cases, outperform state-of-the-art approaches across a range of benchmark RL tasks, all while maintaining a significantly smaller computational footprint than deep-learning methods that rely on experience data.
Problem

Research questions and friction points this paper is trying to address.

Introduces Gaussian mixture models as Q-function loss surrogates
Embeds GMMs in Bellman residuals with Riemannian manifold optimization
Achieves competitive RL performance with smaller computational footprint
Innovation

Methods, ideas, or system contributions that make the work stand out.

GMMs approximate Q-functions directly as loss surrogates
Parameters learned via Riemannian manifold optimization
Competitive performance with smaller computational footprint
🔎 Similar Papers
No similar papers found.
M
Minh Vu
Department of Information and Communications, Institute of Science Tokyo, 4259-G2-4 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa, Japan
Konstantinos Slavakis
Konstantinos Slavakis
Institute of Science Tokyo (ex TokyoTech), Department of Information and Communications Engineering
Signal processingMachine learning