🤖 AI Summary
To address the high policy evaluation cost and low search efficiency in Evolutionary Reinforcement Learning (ERL), this paper proposes a surrogate-assisted framework integrating autoencoders and hyperbolic neural networks. Our method introduces the first learnable low-dimensional embedding for ERL policies and constructs a classification-based surrogate model to jointly enable efficient pre-screening and quality assessment of high-dimensional deep neural network policies. By synergistically combining the autoencoder’s nonlinear dimensionality reduction capability with hyperbolic space’s intrinsic capacity to model hierarchical policy structures, our approach significantly improves exploration quality and convergence speed. Evaluated on ten Atari and four MuJoCo benchmark tasks, it consistently outperforms state-of-the-art ERL baselines. Visualization analysis further confirms superior search trajectories, more thorough exploration, and faster convergence.
📝 Abstract
Evolutionary Reinforcement Learning (ERL), training the Reinforcement Learning (RL) policies with Evolutionary Algorithms (EAs), have demonstrated enhanced exploration capabilities and greater robustness than using traditional policy gradient. However, ERL suffers from the high computational costs and low search efficiency, as EAs require evaluating numerous candidate policies with expensive simulations, many of which are ineffective and do not contribute meaningfully to the training. One intuitive way to reduce the ineffective evaluations is to adopt the surrogates. Unfortunately, existing ERL policies are often modeled as deep neural networks (DNNs) and thus naturally represented as high-dimensional vectors containing millions of weights, which makes the building of effective surrogates for ERL policies extremely challenging. This paper proposes a novel surrogate-assisted ERL that integrates Autoencoders (AE) and Hyperbolic Neural Networks (HNN). Specifically, AE compresses high-dimensional policies into low-dimensional representations while extracting key features as the inputs for the surrogate. HNN, functioning as a classification-based surrogate model, can learn complex nonlinear relationships from sampled data and enable more accurate pre-selection of the sampled policies without real evaluations. The experiments on 10 Atari and 4 Mujoco games have verified that the proposed method outperforms previous approaches significantly. The search trajectories guided by AE and HNN are also visually demonstrated to be more effective, in terms of both exploration and convergence. This paper not only presents the first learnable policy embedding and surrogate-modeling modules for high-dimensional ERL policies, but also empirically reveals when and why they can be successful.