Successive Halving with Learning Curve Prediction via Latent Kronecker Gaussian Processes

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Successive Halving (SH) relies heavily on early performance evaluations, leading to premature pruning of slow-starting models. Method: We propose LKGPSH, the first method to employ Latent Kronecker-structured Gaussian Processes (LKGP) for learning curve modeling—integrating Bayesian optimization with multi-task regression to predict final validation performance from incomplete training trajectories, thereby dynamically adapting SH’s resource allocation and elimination decisions. Contribution/Results: LKGPSH substantially mitigates overreliance on intermediate metrics, improving retention rates of slow-starting configurations. Empirical evaluation across multiple benchmarks demonstrates competitive performance; while its current resource efficiency does not uniformly dominate the Pareto frontier of standard SH, it effectively reduces dependence on complete training runs by leveraging historical learning curves via knowledge transfer.

Technology Category

Application Category

📝 Abstract

Successive Halving is a popular algorithm for hyperparameter optimization which allocates exponentially more resources to promising candidates. However, the algorithm typically relies on intermediate performance values to make resource allocation decisions, which can cause it to prematurely prune slow starters that would eventually become the best candidate. We investigate whether guiding Successive Halving with learning curve predictions based on Latent Kronecker Gaussian Processes can overcome this limitation. In a large-scale empirical study involving different neural network architectures and a click prediction dataset, we compare this predictive approach to the standard approach based on current performance values. Our experiments show that, although the predictive approach achieves competitive performance, it is not Pareto optimal compared to investing more resources into the standard approach, because it requires fully observed learning curves as training data. However, this downside could be mitigated by leveraging existing learning curve data.

Problem

Research questions and friction points this paper is trying to address.

Predicts learning curves to prevent premature pruning in hyperparameter optimization

Compares predictive Successive Halving against standard performance-based allocation

Evaluates resource efficiency using neural networks and click prediction datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Kronecker Gaussian Processes for prediction

Learning curve forecasting guides resource allocation

Predictive approach mitigates premature pruning limitation

🔎 Similar Papers

No similar papers found.