More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the diminishing marginal returns observed when repeatedly sampling from large language models under a fixed computational budget, which leads to slow growth in problem coverage. To improve coverage efficiency, the authors propose Reset-and-Discard (ReD), a method featuring an adaptive reset-and-discard mechanism. They establish, for the first time, a power-law relationship between pass@k and coverage@cost, and leverage this insight to design a general query strategy that requires no prior knowledge of pass@k and automatically infers the power-law exponent. Evaluated on the HumanEval benchmark, ReD significantly reduces the number of attempts, token consumption, and overall cost required to achieve target coverage across three mainstream large language models, while also offering a novel and efficient approach to measuring reasoning-related power laws.

Technology Category

Application Category

📝 Abstract
The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for any given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs using HumanEval demonstrate that ReD substantially reduces the required attempts, tokens, and USD cost to reach a desired coverage, while also offering an efficient way to measure inference power-laws.
Problem

Research questions and friction points this paper is trying to address.

large language models
fixed budget
coverage@cost
diminishing returns
inference efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reset-and-Discard
coverage@cost
pass@k
power-law inference
large language models
🔎 Similar Papers