🤖 AI Summary
Direct point-wise loss functions (e.g., MSE) in time series forecasting suffer from conditional distribution bias due to label autocorrelation, leading to poor probabilistic calibration and degraded long-horizon performance.
Method: This paper proposes a novel training paradigm based on joint-distribution Wasserstein alignment. It introduces a differentiable, trainable joint Wasserstein distance as a surrogate objective—provably upper-bounding the conditional distribution discrepancy—and employs empirical sample estimation with alternating optimization for efficient gradient-based learning.
Contribution/Results: The method is model-agnostic and plug-and-play, requiring no architectural modifications. It consistently improves both point prediction accuracy and distributional consistency across diverse forecasting models. Empirical evaluation on multiple benchmark datasets demonstrates state-of-the-art performance, particularly in long-term forecasting, where it significantly enhances predictive reliability and calibration.
📝 Abstract
Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimize the conditional negative log-likelihood of the label sequence, typically estimated using the mean squared error. However, this estimation proves to be biased in the presence of label autocorrelation. In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. Because conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a newly proposed joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. This discrepancy admits tractable, differentiable estimation from empirical samples and integrates seamlessly with gradient-based training. Extensive experiments show that DistDF improves the performance diverse forecast models and achieves the state-of-the-art forecasting performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.