🤖 AI Summary
This work addresses the pricing problem (PP) in column generation for the vehicle routing problem (VRP), proposing the first purely end-to-end reinforcement learning framework—replacing conventional dynamic programming or heuristic approaches. The method leverages attention-based deep reinforcement learning to directly generate feasible columns (routes) with the most negative reduced cost. Its core innovation lies in formulating the PP as a sequential decision-making task, enabling neural networks to efficiently explore and generalize over the exponential column space. Evaluated on standard VRP instances with 100 customers, the approach achieves over 300× speedup in pricing compared to traditional methods, while maintaining the linear programming relaxation optimality gap within 9%. The framework thus delivers a favorable trade-off among solution accuracy, computational efficiency, and generalization across diverse problem instances.
📝 Abstract
In this paper, we address the problem of Column Generation (CG) using Reinforcement Learning (RL). Specifically, we use a RL model based on the attention-mechanism architecture to find the columns with most negative reduced cost in the Pricing Problem (PP). Unlike previous Machine Learning (ML) applications for CG, our model deploys an end-to-end mechanism as it independently solves the pricing problem without the help of any heuristic. We consider a variant of Vehicle Routing Problem (VRP) as a case study for our method. Through a set of experiments where our method is compared against a Dynamic Programming (DP)-based heuristic for solving the PP, we show that our method solves the linear relaxation up to a reasonable objective gap within 9% in significantly shorter running times, up to over 300 times faster for instances with 100 customers.