Construct, Merge, Solve&Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes RL-CMSA, a novel approach for the min-max multiple Traveling Salesman Problem (min-max mTSP), which minimizes the longest tour among all salesmen. RL-CMSA is the first to integrate reinforcement learning into a constructive metaheuristic framework. It learns Q-values representing the co-occurrence likelihood of city pairs to guide probabilistic clustering for initial solution construction, dynamically maintains a compact pool of high-quality solutions, and refines them through a combination of a restricted set-covering mixed-integer linear programming (MILP) formulation and local search operators based on removal, relocation, and exchange moves. Evaluated on both random and TSPLIB instances, RL-CMSA consistently yields near-optimal or optimal solutions, significantly outperforming state-of-the-art hybrid genetic algorithms—particularly in large-scale instances and scenarios with many salesmen—while effectively achieving balanced workload distribution.

Technology Category

Application Category

📝 Abstract
The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve&Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation. Computational results on random and TSPLIB instances show that RL-CMSA consistently finds (near-)best solutions and outperforms a state-of-the-art hybrid genetic algorithm under comparable time limits, especially as instance size and the number of salesmen increase.
Problem

Research questions and friction points this paper is trying to address.

min-max mTSP
Multiple Traveling Salesman Problem
workload balance
symmetric single-depot
combinatorial optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Multiple Traveling Salesman Problem
Hybrid Optimization
Set-Covering MILP
Probabilistic Clustering
🔎 Similar Papers
No similar papers found.