ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses key limitations in existing test-time compute (TTC) methods—namely, fragmented strategy design, inconsistent evaluation protocols, and insufficient joint analysis of quality and computational cost. To overcome these challenges, the authors propose ThinkBooster, a unified framework that integrates a modular TTC strategy library, a standardized joint evaluation benchmark, an OpenAI-compatible agent service, and a visual debugging toolkit, enabling seamless deployment of adaptive reasoning in real-world applications. By leveraging techniques such as multi-sample generation and verifier-based reranking, ThinkBooster systematically uncovers the performance–compute trade-offs across diverse TTC strategies on mathematical and programming tasks. The framework significantly enhances the co-optimization of reasoning quality and efficiency and demonstrates tangible practical gains in authentic deployment scenarios.

📝 Abstract

Test-time compute (TTC) scaling has emerged as a powerful paradigm for improving large language model (LLM) reasoning by allocating additional compute during inference, e.g., via multi-sample generation and verifier-based reranking. Existing TTC scaling strategies and reasoning scorers remain fragmented, evaluated under inconsistent protocols, and are rarely analyzed through the lens of quality-cost trade-offs. We introduce ThinkBooster, a unified framework for seamless test-time compute scaling of LLM reasoning, which consists of (i) a modular Python library implementing state-of-the-art TTC scaling strategy and scorer families, (ii) a benchmark that jointly evaluates performance and computational efficiency, and (iii) a deployable OpenAI-compatible proxy service that enables drop-in integration of adaptive reasoning into real-world applications. We further provide a demo visual debugger for inspecting the reasoning trajectories, intermediate selection decisions, and alternative reasoning paths. Empirical results on mathematical and coding tasks reveal the performance-compute trade-offs of TTC scaling strategies and scoring methods and demonstrate that ThinkBooster provides practical gains in real-world tasks. The code is available online under an MIT license.

Problem

Research questions and friction points this paper is trying to address.

test-time compute scaling

LLM reasoning

quality-cost trade-offs

reasoning scorers

fragmented strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time compute scaling

LLM reasoning

unified framework