ZEBRA: Zero-shot Budgeted Resource Allocation for LLM Orchestration

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the challenge of efficiently allocating a fixed computational budget across stages in multi-agent pipeline reasoning. It proposes the first zero-shot, multi-stage budget allocation framework, formulating the problem as a continuous nonlinear knapsack optimization. The method leverages large language models to estimate stage-wise utility curves and combines Lagrangian multipliers with a water-filling algorithm to compute the optimal allocation. The framework unifies support for both additive and multiplicative aggregation strategies and adapts seamlessly to diverse pipeline architectures. Experiments demonstrate that on the APPS benchmark, the approach recovers 94.4% of unconstrained performance using only 50% of the budget, substantially outperforming direct allocation by LLMs. On HotpotQA, it achieves a 14.3-percentage-point improvement and exhibits strong robustness to noise in utility estimation.
📝 Abstract
As autonomous agents increasingly execute end-to-end tasks under fixed monetary budgets, the pressing open question shifts from whether the budget is respected, to how to spend it effectively. Existing budget-aware methods typically control reasoning step-by-step within a single agent, or learn resource allocation policies via RL. None address how to split a budget across the composing phases of a multi-agent pipeline at inference time. We propose ZEBRA, a zero-shot framework that reduces multi-phase budget allocation to a continuous nonlinear knapsack problem: an LLM controller estimates per-phase utility curves, and a water-filling search on the Lagrange multiplier returns the per-phase split. Additive and multiplicative aggregations are unified under the same solver. On a $150$-task APPS coding benchmark, both ZEBRA variants outperform LLM-direct (budget allocation directly by an LLM) on every aggregate metric. At a budget of $α= 0.5$ of the unconstrained spend, ZEBRA recovers $94.4\%$ of unconstrained quality, versus $88.1\%$ for LLM-direct. The advantage is statistically significant and transfers beyond coding: on a $3$-phase HotpotQA pipeline, ZEBRA beats LLM-direct by $14.3$pp, with allocations empirically robust to curve-estimation noise. On HotpotQA, ZEBRA arrives at a different budget split (near-balanced) compared to the APPS one (skewed towards a refinement phase), showing adaptation to the pipeline structure. More broadly, we show that lightweight algorithmic guidance at inference time can improve the economic behavior of autonomous multi-agent systems.
Problem

Research questions and friction points this paper is trying to address.

budget allocation
multi-agent systems
LLM orchestration
zero-shot
resource allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot budget allocation
LLM orchestration
nonlinear knapsack
multi-agent pipeline
inference-time optimization
🔎 Similar Papers
No similar papers found.