Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling

📅 2025-05-30
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses inefficient rollout budget allocation in test-time scaling (TTS). We formulate inference search as a resource allocation optimization problem. First, we identify and characterize the detrimental impact of candidate-number bias in solution-level allocation on performance. To overcome this, we propose Direction-Oriented Resource Allocation (DORA): a principled framework grounded in probabilistic modeling and optimal stopping theory that decouples direction-quality estimation from solution-path generation, and introduces a dynamic rollout allocation algorithm tailored for tree search. We theoretically establish that DORA achieves Pareto improvements in both computational efficiency and correctness. Empirically, DORA attains state-of-the-art accuracy on mathematical reasoning benchmarks—including MATH500, AIME2024, and AIME2025—outperforming Chain-of-Thought, Best-of-N, and adaptive search baselines under equivalent computational budgets.

Technology Category

Application Category

📝 Abstract
Test-Time Scaling (TTS) improves the performance of Large Language Models (LLMs) by using additional inference-time computation to explore multiple reasoning paths through search. Yet how to allocate a fixed rollout budget most effectively during search remains underexplored, often resulting in inefficient use of compute at test time. To bridge this gap, we formulate test-time search as a resource allocation problem and derive the optimal allocation strategy that maximizes the probability of obtaining a correct solution under a fixed rollout budget. Within this formulation, we reveal a core limitation of existing search methods: solution-level allocation tends to favor reasoning directions with more candidates, leading to theoretically suboptimal and inefficient use of compute. To address this, we propose Direction-Oriented Resource Allocation (DORA), a provably optimal method that mitigates this bias by decoupling direction quality from candidate count and allocating resources at the direction level. To demonstrate DORA's effectiveness, we conduct extensive experiments on challenging mathematical reasoning benchmarks including MATH500, AIME2024, and AIME2025. The empirical results show that DORA consistently outperforms strong baselines with comparable computational cost, achieving state-of-the-art accuracy. We hope our findings contribute to a broader understanding of optimal TTS for LLMs.
Problem

Research questions and friction points this paper is trying to address.

Optimizing rollout budget allocation for test-time scaling
Addressing inefficient compute usage during LLM reasoning search
Overcoming solution-level allocation bias in reasoning path selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direction-level resource allocation for test-time scaling
Decouples direction quality from candidate count
Provably optimal rollout budget utilization
🔎 Similar Papers
No similar papers found.