CHORUS: Zero-shot Hierarchical Retrieval and Orchestration for Generating Linear Programming Code

📅 2025-05-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Non-expert users struggle to translate natural language–described linear programming (LP) problems into executable optimization code. Method: This paper proposes a retrieval-augmented generation (RAG) framework for NL4Opt, featuring (i) a novel hierarchical tree-based document chunking strategy with self-generated semantic metadata; (ii) a two-stage retrieval pipeline followed by cross-encoder re-ranking; and (iii) integration of expert-crafted prompting and structured parsing reasoning chains. Contribution/Results: It achieves, for the first time, zero-shot, end-to-end generation of syntactically and semantically correct Gurobi-executable code directly from natural language LP descriptions. On the NL4Opt-Code benchmark, it significantly outperforms existing RAG baselines and conventional approaches. Notably, open-weight models—e.g., Llama3.1-8B—attain performance on par with or exceeding that of GPT-3.5 and GPT-4, while reducing computational overhead substantially.

Technology Category

Application Category

📝 Abstract
Linear Programming (LP) problems aim to find the optimal solution to an objective under constraints. These problems typically require domain knowledge, mathematical skills, and programming ability, presenting significant challenges for non-experts. This study explores the efficiency of Large Language Models (LLMs) in generating solver-specific LP code. We propose CHORUS, a retrieval-augmented generation (RAG) framework for synthesizing Gurobi-based LP code from natural language problem statements. CHORUS incorporates a hierarchical tree-like chunking strategy for theoretical contents and generates additional metadata based on code examples from documentation to facilitate self-contained, semantically coherent retrieval. Two-stage retrieval approach of CHORUS followed by cross-encoder reranking further ensures contextual relevance. Finally, expertly crafted prompt and structured parser with reasoning steps improve code generation performance significantly. Experiments on the NL4Opt-Code benchmark show that CHORUS improves the performance of open-source LLMs such as Llama3.1 (8B), Llama3.3 (70B), Phi4 (14B), Deepseek-r1 (32B), and Qwen2.5-coder (32B) by a significant margin compared to baseline and conventional RAG. It also allows these open-source LLMs to outperform or match the performance of much stronger baselines-GPT3.5 and GPT4 while requiring far fewer computational resources. Ablation studies further demonstrate the importance of expert prompting, hierarchical chunking, and structured reasoning.
Problem

Research questions and friction points this paper is trying to address.

Generating LP code from natural language using LLMs
Enhancing retrieval-augmented generation with hierarchical strategies
Improving code generation performance for non-experts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical tree-like chunking for theoretical content
Two-stage retrieval with cross-encoder reranking
Expert prompting and structured reasoning parser
🔎 Similar Papers
No similar papers found.