π€ AI Summary
This paper addresses the weak theoretical foundations of reinforcement learning (RL) in continuous state-action spaces. To this end, it introduces a geometric analytical framework that constructs a manifold of state reachability induced by stochastic policies. This framework establishes, for the first time, a rigorous theoretical connection between the geometric structure of the state space and the dimensionality of the action space: it proves that the intrinsic dimension of the reachable manifold scales withβand is upper-bounded byβthe action-space dimension. Methodologically, the approach integrates a local manifold-learning layer into a two-layer neural policy network, embedded within an actor-critic architecture and trained via semi-gradient optimization to learn sparse, low-dimensional state representations. Experiments on MuJoCo benchmarks and synthetic environments demonstrate substantial improvements in sample efficiency and policy performance for high-degree-of-freedom control tasks.
π Abstract
Advances in reinforcement learning (RL) have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens to understand the locally attained set of states. The set of all parametrised policies learnt through a semi-gradient based approach induces a set of attainable states in RL. We show that the training dynamics of a two-layer neural policy induce a low dimensional manifold of attainable states embedded in the high-dimensional nominal state space trained using an actor-critic algorithm. We prove that, under certain conditions, the dimensionality of this manifold is of the order of the dimensionality of the action space. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments and also demonstrate the results in a toy environment with varying dimensionality. We also show the applicability of this theoretical result by introducing a local manifold learning layer to the policy and value function networks to improve the performance in control environments with very high degrees of freedom by changing one layer of the neural network to learn sparse representations.