Measuring Time Series Forecast Stability for Demand Planning

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the critical issue of prediction instability in time-series forecasting models for supply chain demand planning—where identical inputs yield highly variable outputs, increasing manual intervention and eroding planner trust. Departing from conventional accuracy-centric evaluation paradigms, we formally define and prioritize *prediction stability*—quantified as the variance of repeated forecasts over fixed inputs—as a key operational deployment metric, and systematically investigate how intrinsic model stochasticity undermines output consistency. Experiments on the M5 and Favorita benchmarks compare state-of-the-art models including Chronos, DeepAR, PatchTST, TFT, TiDE, and AutoGluon. Results demonstrate that ensemble-based approaches significantly reduce forecast variance while preserving accuracy, thereby enhancing stability. Our work establishes stability as a first-class criterion for model selection and deployment in production, advancing the field from an “accuracy-first” to a “stability-accuracy balanced” paradigm.

Technology Category

Application Category

📝 Abstract
Time series forecasting is a critical first step in generating demand plans for supply chains. Experiments on time series models typically focus on demonstrating improvements in forecast accuracy over existing/baseline solutions, quantified according to some accuracy metric. There is no doubt that forecast accuracy is important; however in production systems, demand planners often value consistency and stability over incremental accuracy improvements. Assuming that the inputs have not changed significantly, forecasts that vary drastically from one planning cycle to the next require high amounts of human intervention, which frustrates demand planners and can even cause them to lose trust in ML forecasting models. We study model-induced stochasticity, which quantifies the variance of a set of forecasts produced by a single model when the set of inputs is fixed. Models with lower variance are more stable. Recently the forecasting community has seen significant advances in forecast accuracy through the development of deep machine learning models for time series forecasting. We perform a case study measuring the stability and accuracy of state-of-the-art forecasting models (Chronos, DeepAR, PatchTST, Temporal Fusion Transformer, TiDE, and the AutoGluon best quality ensemble) on public data sets from the M5 competition and Favorita grocery sales. We show that ensemble models improve stability without significantly deteriorating (or even improving) forecast accuracy. While these results may not be surprising, the main point of this paper is to propose the need for further study of forecast stability for models that are being deployed in production systems.
Problem

Research questions and friction points this paper is trying to address.

Assessing stability of time series forecasts for demand planning
Comparing stability and accuracy of advanced forecasting models
Proposing further research on forecast stability in production systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Measure model-induced stochasticity for stability
Compare stability and accuracy of top models
Ensemble models enhance stability and accuracy