🤖 AI Summary
This work addresses the lack of a general-purpose representation that effectively transfers the next-token prediction paradigm from natural language to unbounded continuous time series. The authors propose UniTok, a universal tokenizer that discretizes time series into language-like tokens, and build UniTok-FM—a foundation model leveraging off-the-shelf large language models—pretrained via next-token prediction over multi-series contextual windows. UniTok-FM is the first to enable zero-shot, context-aware inference across forecasting, generation, and classification tasks without task-specific training. It introduces a structure-preserving reconstruction mechanism and a progressive-resolution causal encoder-decoder architecture. Experiments demonstrate that a single UniTok-FM consistently outperforms conventional baselines, matches or exceeds specialized models across diverse tasks, and uniquely supports training-free cross-task inference.
📝 Abstract
While Next-Token Prediction (NTP) has unified LLM pretraining, its adaptation to unbounded, continuous time series (TS) remains open. To bridge the gap, we introduce UniTok, a universal tokenizer that transforms TS into discrete tokens, and UniTok-FM, a foundation model pretrained via NTP on these tokens. UniTok-FM is a general-purpose foundation model that supports zero-shot and prompt-boosted forecasting, as well as few-shot generation and classification via training-free in-context inference--a capability not achieved by prior works. Technically, UniTok is a vector-quantized autoencoder incorporating prefix normalization for scale stabilization, a progressive-resolution causal architecture for encoding and decoding, and a structure-preserving reconstruction loss for training. UniTok-FM adopts an off-the-shelf LLM architecture without TS-specific modifications. Instead of pretraining on isolated TS, it performs NTP on context windows formed by multiple series with similar patterns, aiming to capture their shared dynamics. Experiments on forecasting, generation, and classification show that a single unified UniTok-FM consistently outperforms statistical and supervised baselines, achieves competitive performance with task-specific foundation models, and uniquely enables training-free in-context inference across tasks.