Reinforcement Learning for Solving the Pricing Problem in Column Generation: Applications to Vehicle Routing

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the pricing problem (PP) in column generation for the vehicle routing problem (VRP), proposing the first purely end-to-end reinforcement learning framework—replacing conventional dynamic programming or heuristic approaches. The method leverages attention-based deep reinforcement learning to directly generate feasible columns (routes) with the most negative reduced cost. Its core innovation lies in formulating the PP as a sequential decision-making task, enabling neural networks to efficiently explore and generalize over the exponential column space. Evaluated on standard VRP instances with 100 customers, the approach achieves over 300× speedup in pricing compared to traditional methods, while maintaining the linear programming relaxation optimality gap within 9%. The framework thus delivers a favorable trade-off among solution accuracy, computational efficiency, and generalization across diverse problem instances.

Technology Category

Application Category

📝 Abstract
In this paper, we address the problem of Column Generation (CG) using Reinforcement Learning (RL). Specifically, we use a RL model based on the attention-mechanism architecture to find the columns with most negative reduced cost in the Pricing Problem (PP). Unlike previous Machine Learning (ML) applications for CG, our model deploys an end-to-end mechanism as it independently solves the pricing problem without the help of any heuristic. We consider a variant of Vehicle Routing Problem (VRP) as a case study for our method. Through a set of experiments where our method is compared against a Dynamic Programming (DP)-based heuristic for solving the PP, we show that our method solves the linear relaxation up to a reasonable objective gap within 9% in significantly shorter running times, up to over 300 times faster for instances with 100 customers.
Problem

Research questions and friction points this paper is trying to address.

Using RL to solve Column Generation's Pricing Problem
End-to-end RL model replaces heuristics for VRP
Achieves faster solutions with 9% objective gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning for Column Generation
Attention-mechanism based end-to-end solution
Faster than Dynamic Programming heuristics
🔎 Similar Papers
No similar papers found.