🤖 AI Summary
Traditional combinatorial optimization (CO) solvers rely heavily on domain expertise and problem-specific algorithms, lacking generality; existing large language model (LLM)-based approaches still require code generation or external solver invocation, failing to achieve end-to-end natural-language-to-solution mapping. This work proposes the first purely language-driven, end-to-end CO solving framework—eliminating manual algorithm design, coding, and solver integration. Our core innovation is feasibility- and optimality-aware reinforcement learning (FOARL), combined with supervised fine-tuning, applied in a two-stage training regimen for a 7B-parameter LLM. Evaluated on seven NP-hard problems, our method achieves average optimality gaps of only 1.03%–8.20%, substantially outperforming GPT-4o, DeepSeek-R1, and classical heuristics, while maintaining high solution feasibility. The framework establishes a unified, accessible, language-native optimization paradigm for complex decision-making domains such as logistics and manufacturing.
📝 Abstract
Combinatorial optimization (CO) problems, central to decision-making scenarios like logistics and manufacturing, are traditionally solved using problem-specific algorithms requiring significant domain expertise. While large language models (LLMs) have shown promise in automating CO problem solving, existing approaches rely on intermediate steps such as code generation or solver invocation, limiting their generality and accessibility. This paper introduces a novel framework that empowers LLMs to serve as end-to-end CO solvers by directly mapping natural language problem descriptions to solutions. We propose a two-stage training strategy: supervised fine-tuning (SFT) imparts LLMs with solution generation patterns from domain-specific solvers, while a feasibility-and-optimality-aware reinforcement learning (FOARL) process explicitly mitigates constraint violations and refines solution quality. Evaluation across seven NP-hard CO problems shows that our method achieves a high feasibility rate and reduces the average optimality gap to 1.03-8.20% by tuning a 7B-parameter LLM, surpassing both general-purpose LLMs (e.g., GPT-4o), reasoning models (e.g., DeepSeek-R1), and domain-specific heuristics. Our method establishes a unified language-based pipeline for CO without extensive code execution or manual architectural adjustments for different problems, offering a general and language-driven alternative to traditional solver design while maintaining relative feasibility guarantees.