🤖 AI Summary
Existing reinforcement learning–based multi-task neural combinatorial optimization methods suffer from difficulties in training large-scale decoders and poor generalization across diverse problem variants. To address these challenges for the multi-variant Vehicle Routing Problem (VRP), this paper proposes a generalized unified neural solver. Our key contributions are: (1) a novel multi-task learning framework, MTL-KD, leveraging knowledge distillation to mitigate inter-task gradient conflicts and enable stable training of massive decoders; and (2) a Randomized Reordering and Reconstruction (R3C) inference strategy that enhances adaptability to heterogeneous VRP variants. Extensive experiments on six seen and ten unseen VRP variants—scaling up to 1,000 nodes—demonstrate significant improvements over state-of-the-art baselines. The solver exhibits strong cross-task and cross-scale generalization, consistently outperforming prior methods on both uniformly distributed benchmarks and real-world road network instances.
📝 Abstract
Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach to train a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. However, existing Reinforcement Learning (RL)-based multi-task methods can only train light decoder models on small-scale problems, exhibiting limited generalization ability when solving large-scale problems. To overcome this limitation, this work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD), which enables the efficient training of heavy decoder models with strong generalization ability. The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, facilitating label-free training and effectively improving the model's generalization ability across diverse tasks. In addition, we introduce a flexible inference strategy termed Random Reordering Re-Construction (R3C), which is specifically adapted for diverse VRP tasks and further boosts the performance of the multi-task model. Experimental results on 6 seen and 10 unseen VRP variants with up to 1000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks, demonstrating robust generalization abilities.