🤖 AI Summary
This study investigates the integration of reinforcement learning to optimize neighborhood selection strategies within local search metaheuristics for enhanced combinatorial optimization performance. Tailored reward functions are designed for the Traveling Salesman Problem, the Pickup and Delivery Problem with Time Windows, and the Car Sequencing Problem to align with search dynamics and mitigate the adverse impact of cost fluctuations caused by constraint penalties on learning signals. A systematic comparison is conducted between multi-armed bandit approaches (UCB and ε-greedy) and deep reinforcement learning methods (PPO and Double DQN) in terms of efficacy and computational overhead. The results reveal that ε-greedy consistently achieves stable and efficient performance across multiple problem instances, whereas deep reinforcement learning methods only become competitive at the expense of substantially increased computation time, highlighting a critical trade-off between practical utility and computational cost.
📝 Abstract
Reinforcement learning has recently gained traction as a means to improve combinatorial optimization methods, yet its effectiveness within local search metaheuristics specifically remains comparatively underexamined. In this study, we evaluate a range of reinforcement learning-based neighborhood selection strategies -- multi-armed bandits (upper confidence bound, $\epsilon$-greedy) and deep reinforcement learning methods (proximal policy optimization, double deep $Q$-network) -- and compare them against multiple baselines across three different problems: the traveling salesman problem, the pickup and delivery problem with time windows, and the car sequencing problem. We show how search-specific characteristics, particularly large variations in cost due to constraint violation penalties, necessitate carefully designed reward functions to provide stable and informative learning signals. Our extensive experiments reveal that algorithm performance varies substantially across problems, although that $\epsilon$-greedy consistently ranks among the best performers. In contrast, the computational overhead of deep reinforcement learning approaches only makes them competitive with a substantially longer runtime. These findings highlight both the promise and the practical limitations of deep reinforcement learning in local search.