Multi-LLM Query Optimization

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the query allocation problem when deploying multiple large language models (LLMs) in parallel for classification, aiming to minimize total query cost while satisfying reliability constraints for each class. The problem is formulated as an offline optimization task with state-dependent error constraints. We propose the first proxy optimization framework that simultaneously guarantees feasibility and asymptotic tightness. By integrating union bound decomposition, Chernoff-type concentration inequalities, and combinatorial optimization techniques, we construct a closed-form separable surrogate objective, enabling the design of an asymptotic fully polynomial-time approximation scheme (AFPTAS). Theoretical analysis shows that the ratio between the surrogate cost and the true optimal cost converges to one at an explicit rate as the error tolerance approaches zero, thereby achieving a (1+ε)-approximation guarantee.

Technology Category

Application Category

📝 Abstract
Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of $O\left(\log\log(1/α_{\min}) / \log(1/α_{\min})\right)$. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a $(1+\varepsilon)$ factor of the surrogate optimum.
Problem

Research questions and friction points this paper is trying to address.

Multi-LLM
Query Optimization
Error Constraints
Cost Minimization
Reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-LLM query optimization
Statewise error constraints
Surrogate optimization
Asymptotic tightness
AFPTAS
🔎 Similar Papers
No similar papers found.
A
Arlen Dean
Washington University in St. Louis, Olin Business School
Zijin Zhang
Zijin Zhang
Boston College, Carroll School of Management
Stefanus Jasin
Stefanus Jasin
Unknown affiliation
Y
Yuqing Liu
University of Michigan, Industrial and Operations Engineering