Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses a Bayesian binary sequential hypothesis testing problem in a multi-heterogeneous large language model (LLM) setting, where each LLM exhibits asymmetric accuracy, heterogeneous query costs, and random waiting times. The objective is to minimize the total expected cost comprising query expenses and waiting delays. The authors propose a belief-dependent hybrid-switching policy that dynamically selects among available LLMs and terminates once the posterior probability crosses a decision threshold. Theoretical analysis reveals that, as the error tolerance α approaches zero, the optimal strategy requires at most two LLMs. By integrating Bayesian inference, sequential analysis, information-rate modeling, and extremal-point optimization over convex sets, the proposed method achieves asymptotic optimality—attaining a (1+o(1)) multiplicative factor of the universal lower bound under sub-Gaussian assumptions on waiting times.

📝 Abstract

We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM $j$ has per-query cost $c_j>0$, random waiting time with mean $μ_j>0$ and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis $θ\in\{A,B\}$ and needs not be the same under $A$ and $B$. This asymmetry induces two distinct information rates $(I_{j,A}, I_{j,B})$ per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds $1-α$. The objective is to minimize the sum of expected query cost and expected waiting cost, $\mathbb{E}[C_π] + \mathbb{E}[g(W_π)]$, where $C_π$ is the total query cost, $W_π$ is the total waiting time and $g$ is a polynomial function (e.g., $g(x)=x^ρ$ with $ρ\ge 1$). We prove that as the error tolerance $α\to0$, the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under $A$ and information under $B$. Any admissible policy induces an expected information-allocation vector in $\mathbb{R}_+^2$, and we show that the optimal allocation lies at an extreme point of the associated convex set when $α$ is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single ``specialist'' LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a $(1+o(1))$ factor as $α\rightarrow 0$.

Problem

Research questions and friction points this paper is trying to address.

sequential hypothesis testing

heterogeneous LLMs

asymmetric accuracy

information rate

cost minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

sequential hypothesis testing

heterogeneous LLMs

asymmetric accuracy

information rate tradeoff

asymptotically optimal policy

🔎 Similar Papers

No similar papers found.

Authors to Follow