FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness

📅 2026-01-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This work addresses the high computational cost of large language model training by proposing a test-time computation (TTC)-aware training strategy that explicitly incorporates TTC into early stopping decisions. For the first time, the approach jointly optimizes intermediate checkpoints and TTC configurations to achieve a synergistic trade-off between training and inference compute, avoiding exhaustive search. An efficient method for evaluating TTC configurations is introduced, along with a breakeven boundary that characterizes the balance between training and inference FLOPs to guide resource allocation. Experiments demonstrate that the proposed method reduces training FLOPs by up to 92% across multiple tasks while maintaining or even improving model accuracy, substantially accelerating model deployment and iteration cycles.

Technology Category

Application Category

📝 Abstract

Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing test-time compute (TTC)-for example through iterative sampling-can allow smaller models to rival or surpass much larger ones at lower overall cost. We introduce TTC-aware training, where an intermediate checkpoint and a corresponding TTC configuration can together match or exceed the accuracy of a fully trained model while requiring substantially fewer training FLOPs. Building on this insight, we propose an early stopping algorithm that jointly selects a checkpoint and TTC configuration to minimize training compute without sacrificing accuracy. To make this practical, we develop an efficient TTC evaluation method that avoids exhaustive search, and we formalize a break-even bound that identifies when increased inference compute compensates for reduced training compute. Experiments demonstrate up to 92\% reductions in training FLOPs while maintaining and sometimes remarkably improving accuracy. These results highlight a new perspective for balancing training and inference compute in model development, enabling faster deployment cycles and more frequent model refreshes. Codes will be publicly released.

Problem

Research questions and friction points this paper is trying to address.

FLOP-Efficient Training

Early Stopping

Test-Time Compute

Training-Compute Reduction

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

TTC-aware training

early stopping

training FLOPs reduction

test-time compute

compute trade-off

🔎 Similar Papers

No similar papers found.

Authors to Follow