FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of large language model training by proposing a test-time computation (TTC)-aware training strategy that explicitly incorporates TTC into early stopping decisions. For the first time, the approach jointly optimizes intermediate checkpoints and TTC configurations to achieve a synergistic trade-off between training and inference compute, avoiding exhaustive search. An efficient method for evaluating TTC configurations is introduced, along with a breakeven boundary that characterizes the balance between training and inference FLOPs to guide resource allocation. Experiments demonstrate that the proposed method reduces training FLOPs by up to 92% across multiple tasks while maintaining or even improving model accuracy, substantially accelerating model deployment and iteration cycles.

Technology Category

Application Category

📝 Abstract
Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing test-time compute (TTC)-for example through iterative sampling-can allow smaller models to rival or surpass much larger ones at lower overall cost. We introduce TTC-aware training, where an intermediate checkpoint and a corresponding TTC configuration can together match or exceed the accuracy of a fully trained model while requiring substantially fewer training FLOPs. Building on this insight, we propose an early stopping algorithm that jointly selects a checkpoint and TTC configuration to minimize training compute without sacrificing accuracy. To make this practical, we develop an efficient TTC evaluation method that avoids exhaustive search, and we formalize a break-even bound that identifies when increased inference compute compensates for reduced training compute. Experiments demonstrate up to 92\% reductions in training FLOPs while maintaining and sometimes remarkably improving accuracy. These results highlight a new perspective for balancing training and inference compute in model development, enabling faster deployment cycles and more frequent model refreshes. Codes will be publicly released.
Problem

Research questions and friction points this paper is trying to address.

FLOP-Efficient Training
Early Stopping
Test-Time Compute
Training-Compute Reduction
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

TTC-aware training
early stopping
training FLOPs reduction
test-time compute
compute trade-off
🔎 Similar Papers
No similar papers found.
H
Hossam Amer
Ascend Team, Huawei Technologies, Toronto, Canada
M
Maryam Dialameh
Ascend Team, Huawei Technologies, Toronto, Canada
H
Hossein Rajabzadeh
Ascend Team, Huawei Technologies, Toronto, Canada
Walid Ahmed
Walid Ahmed
Huawei Technologies Canada
Deep LearningMachine LearningSoft Computing
Weiwei Zhang
Weiwei Zhang
Huawei Canada Research
Large Language ModelsNatural Language ProcessingMachine Learning
Y
Yang Liu
Ascend Team, Huawei Technologies, Toronto, Canada