Improving Controller Generalization with Dimensionless Markov Decision Processes

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Reinforcement learning controllers exhibit poor generalization when dynamics differ between training and testing environments. To address this, we propose a model-driven framework grounded in dimensionless Markov decision processes (Π-MDPs), the first to systematically incorporate the Buckingham-Π theorem into MDP modeling. Our approach applies physical dimension normalization to both state and action spaces, endowing learned policies with equivariance and intrinsic robustness to variations in physical parameters—such as gravity, mass, and length. The framework integrates context-aware MDPs, Gaussian process dynamics modeling, and model-based policy search. Evaluated on simulated inverted pendulum and cart-pole tasks, policies trained in a single environment maintain stable performance under large-scale physical parameter shifts—e.g., ±50% variation in gravity or mass—demonstrating significantly improved generalization over baseline methods.

Technology Category

Application Category

📝 Abstract

Controllers trained with Reinforcement Learning tend to be very specialized and thus generalize poorly when their testing environment differs from their training one. We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space. To do so, we introduce the Dimensionless Markov Decision Process ($Pi$-MDP): an extension of Contextual-MDPs in which state and action spaces are non-dimensionalized with the Buckingham-$Pi$ theorem. This procedure induces policies that are equivariant with respect to changes in the context of the underlying dynamics. We provide a generic framework for this approach and apply it to a model-based policy search algorithm using Gaussian Process models. We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.

Problem

Research questions and friction points this paper is trying to address.

Improving controller generalization in varying environments

Introducing Dimensionless MDP for non-dimensionalized state-action spaces

Ensuring policy robustness to contextual shifts in dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dimensionless state-action space training

Model-Based approach with Buckingham-Π theorem

Equivariant policies for dynamic context changes

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey