FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting

📅 2024-10-15
🏛️ arXiv.org
📈 Citations: 8
Influential: 1
📄 PDF
🤖 AI Summary
Existing time-series forecasting (TSF) methods suffer from poor generalization and strong domain dependence, while time-series foundation models (TSFMs) lack a unified, comprehensive evaluation benchmark. Method: We introduce FoundTS—the first holistic benchmark for TSFMs—supporting zero-shot, few-shot, and full-shot evaluation across both LLM-based and pure time-series pre-trained models. It establishes the first standardized evaluation framework integrating multi-shot paradigms, cross-architecture support (LLMs vs. TS-specific models), and multi-domain coverage (e.g., finance, meteorology, energy), with rigorously standardized data splitting, normalization, and prompt adaptation protocols. Contribution/Results: Extensive experiments on diverse, heterogeneous datasets systematically expose critical generalization bottlenecks and cross-domain transfer limitations of current TSFMs. All code and datasets are publicly released to foster reproducible, fair, and rigorous TSFM research.

Technology Category

Application Category

📝 Abstract
Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale language or time series data, they exhibit promising inferencing capabilities in new or unseen data. This has spurred a surge in new TSF foundation models. We propose a new benchmark, FoundTS, to enable thorough and fair evaluation and comparison of such models. FoundTS covers a variety of TSF foundation models, including those based on large language models and those pretrained on time series. Next, FoundTS supports different forecasting strategies, including zero-shot, few-shot, and full-shot, thereby facilitating more thorough evaluations. Finally, FoundTS offers a pipeline that standardizes evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, thereby facilitating fair evaluations. Building on this, we report on an extensive evaluation of TSF foundation models on a broad range of datasets from diverse domains and with different statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing foundation models, and we identify directions for future model design. We make our code and datasets available at https://anonymous.4open.science/r/FoundTS-C2B0.
Problem

Research questions and friction points this paper is trying to address.

Evaluating generalizability of Time Series Foundation Models across domains
Standardizing benchmarking for diverse forecasting scenarios (zero/few/full-shot)
Identifying limitations and improvement directions for existing TSFMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for Time Series Foundation Models
Supports zero-shot, few-shot, full-shot scenarios
Standardized protocols for fair evaluation
🔎 Similar Papers
No similar papers found.