DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Direct point-wise loss functions (e.g., MSE) in time series forecasting suffer from conditional distribution bias due to label autocorrelation, leading to poor probabilistic calibration and degraded long-horizon performance. Method: This paper proposes a novel training paradigm based on joint-distribution Wasserstein alignment. It introduces a differentiable, trainable joint Wasserstein distance as a surrogate objective—provably upper-bounding the conditional distribution discrepancy—and employs empirical sample estimation with alternating optimization for efficient gradient-based learning. Contribution/Results: The method is model-agnostic and plug-and-play, requiring no architectural modifications. It consistently improves both point prediction accuracy and distributional consistency across diverse forecasting models. Empirical evaluation on multiple benchmark datasets demonstrates state-of-the-art performance, particularly in long-term forecasting, where it significantly enhances predictive reliability and calibration.

Technology Category

Application Category

📝 Abstract

Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimize the conditional negative log-likelihood of the label sequence, typically estimated using the mean squared error. However, this estimation proves to be biased in the presence of label autocorrelation. In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. Because conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a newly proposed joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. This discrepancy admits tractable, differentiable estimation from empirical samples and integrates seamlessly with gradient-based training. Extensive experiments show that DistDF improves the performance diverse forecast models and achieves the state-of-the-art forecasting performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.

Problem

Research questions and friction points this paper is trying to address.

Aligning conditional forecast and label distributions in time-series

Addressing biased estimation due to label autocorrelation in forecasting

Introducing joint-distribution Wasserstein discrepancy for tractable optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses joint-distribution Wasserstein alignment for forecasting

Minimizes discrepancy between forecast and label distributions

Provides differentiable estimation compatible with gradient-based training

🔎 Similar Papers

FreDF: Learning to Forecast in the Frequency Domain