Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the low approximation accuracy and training instability of Q-function estimation in reinforcement learning. We propose directly parameterizing the Q-function using a Gaussian Mixture Model (GMM), treating it as a differentiable surrogate for the Bellman residual to enable end-to-end policy iteration without experience replay. To our knowledge, this is the first work to establish GMM as a universal Q-function approximator. We further introduce Riemannian manifold optimization to naturally enforce positive definiteness of covariance matrices, thereby improving training stability. We provide theoretical guarantees on generalization error bounds and convergence. Empirically, our method matches or surpasses state-of-the-art performance across multiple benchmark tasks, while incurring significantly lower computational overhead than DQN and eliminating reliance on experience sampling or replay buffers.

Technology Category

Application Category

📝 Abstract

Unlike their conventional use as estimators of probability density functions in reinforcement learning (RL), this paper introduces a novel function-approximation role for Gaussian mixture models (GMMs) as direct surrogates for Q-function losses. These parametric models, termed GMM-QFs, possess substantial representational capacity, as they are shown to be universal approximators over a broad class of functions. They are further embedded within Bellman residuals, where their learnable parameters -- a fixed number of mixing weights, together with Gaussian mean vectors and covariance matrices -- are inferred from data via optimization on a Riemannian manifold. This geometric perspective on the parameter space naturally incorporates Riemannian optimization into the policy-evaluation step of standard policy-iteration frameworks. Rigorous theoretical results are established, and supporting numerical tests show that, even without access to experience data, GMM-QFs deliver competitive performance and, in some cases, outperform state-of-the-art approaches across a range of benchmark RL tasks, all while maintaining a significantly smaller computational footprint than deep-learning methods that rely on experience data.

Problem

Research questions and friction points this paper is trying to address.

Introduces Gaussian mixture models as Q-function loss surrogates

Embeds GMMs in Bellman residuals with Riemannian manifold optimization

Achieves competitive RL performance with smaller computational footprint

Innovation

Methods, ideas, or system contributions that make the work stand out.

GMMs approximate Q-functions directly as loss surrogates

Parameters learned via Riemannian manifold optimization

Competitive performance with smaller computational footprint

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning