🤖 AI Summary
This work addresses the challenge that multiparameter dynamic control strategies—often difficult to obtain and lacking interpretability—hinder both theoretical analysis and performance improvement of evolutionary algorithms. For the first time, it integrates deep reinforcement learning (Double DQN) with policy distillation to automatically discover high-performing, interpretable symbolic control rules for the (1+(λ,λ)) genetic algorithm on the OneMax problem. General enhancements such as action space decomposition, reward shifting, and long-horizon discounting are introduced to improve training stability, followed by neural policy distillation to extract concise symbolic policies. The resulting strategies significantly outperform existing baselines across multiple problem scales, offering both superior empirical performance and clear logical structure, thereby overcoming the theoretical limitations inherent in traditional single-parameter control approaches.
📝 Abstract
While deep Reinforcement Learning (deep-RL) has been increasingly applied to parameter control in evolutionary algorithms, rigorous theoretical analysis of parameter control remains largely restricted to single-parameter settings, owing to the difficulty of deriving effective, interpretable multi-parameter policies amenable to formal study. We demonstrate how deep-RL can be leveraged to overcome this barrier, using the (1+($λ$,$λ$))-genetic algorithm optimizing OneMax, one of the few problems where a super-constant speedup of dynamic control has been formally proven, as a representative case study. We first show that standard approaches struggle to converge in this multi-parameter setting, and introduce algorithm-agnostic enhancements targeting action-space decomposition, reward shifting, and long-horizon discounting. With these in place, we compare common deep-RL methods and find that Double Deep Q-Networks uniquely avoid the policy collapse observed in Proximal Policy Optimization, yielding trajectories suitable for downstream analysis. Crucially, we move beyond the ``black-box'' nature of neural networks by distilling the learned behaviors into a transparent, symbolic control policy. This resulting policy does not only offer interpretability for future theoretical analysis but also yields exceptional performance, consistently outperforming existing baselines across a wide range of problem sizes.