Efficient Hyperparameter Search for Non-Stationary Model Training

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online learning systems face prohibitively high hyperparameter search costs due to persistent data distribution shifts. This paper proposes a two-stage efficient hyperparameter search paradigm tailored for non-stationary sequential data: Stage I employs lightweight data summarization and sequence forecasting to rapidly identify high-potential configurations; Stage II performs full training only on the shortlisted candidates. Departing from conventional performance-maximization search strategies, our approach prioritizes early, accurate pruning—significantly reducing redundant computation. Evaluated on the Criteo 1TB dataset, it achieves up to 10× reduction in search cost. Its efficacy and generalizability are further validated in large-scale industrial advertising systems. The core innovation lies in reframing hyperparameter optimization—from static performance tuning to dynamic, adaptivity-driven candidate screening—thereby overcoming the fundamental limitations of traditional methods in time-varying environments.

Technology Category

Application Category

📝 Abstract
Online learning is the cornerstone of applications like recommendation and advertising systems, where models continuously adapt to shifting data distributions. Model training for such systems is remarkably expensive, a cost that multiplies during hyperparameter search. We introduce a two-stage paradigm to reduce this cost: (1) efficiently identifying the most promising configurations, and then (2) training only these selected candidates to their full potential. Our core insight is that focusing on accurate identification in the first stage, rather than achieving peak performance, allows for aggressive cost-saving measures. We develop novel data reduction and prediction strategies that specifically overcome the challenges of sequential, non-stationary data not addressed by conventional hyperparameter optimization. We validate our framework's effectiveness through a dual evaluation: first on the Criteo 1TB dataset, the largest suitable public benchmark, and second on an industrial advertising system operating at a scale two orders of magnitude larger. Our methods reduce the total hyperparameter search cost by up to 10$ imes$ on the public benchmark and deliver significant, validated efficiency gains in the industrial setting.
Problem

Research questions and friction points this paper is trying to address.

Reducing hyperparameter search cost for online learning models
Addressing non-stationary data challenges in hyperparameter optimization
Efficiently identifying promising configurations for model training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage hyperparameter search for cost reduction
Novel data reduction and prediction strategies
Focus on accurate configuration identification over peak performance
B
Berivan Isik
Google DeepMind
Matthew Fahrbach
Matthew Fahrbach
Google Research
AlgorithmsDiscrete MathematicsMachine LearningOptimization
Dima Kuzmin
Dima Kuzmin
Google, Inc
Machine Learning
N
Nicolas Mayoraz
Google Research
E
Emil Praun
Google Research
Steffen Rendle
Steffen Rendle
Google
R
Raghavendra Vasudeva
Google Research