🤖 AI Summary
Existing discrete token modeling (DTM) approaches struggle to capture multiscale temporal dynamics in time series and lack a theoretically grounded optimization framework. To address these limitations, we propose the Multiscale Discrete Transformer (MDT). Our method introduces: (1) a novel multiscale tokenization mechanism that jointly represents time-series structures at multiple granularities within a shared discrete latent space; (2) the first integration of rate-distortion theory into DTM, providing principled guidance for model compression and reconstruction trade-offs; and (3) a multiscale autoregressive discrete Transformer architecture enabling cross-scale collaborative modeling. Extensive experiments demonstrate that MDT significantly outperforms state-of-the-art methods across multiple benchmarks. Generated sequences exhibit substantial improvements in fidelity, diversity, and long-range dependency modeling. Both theoretical analysis and empirical results consistently validate the effectiveness and generalization advantage of multiscale representation in time-series DTM.
📝 Abstract
Discrete Token Modeling (DTM), which employs vector quantization techniques, has demonstrated remarkable success in modeling non-natural language modalities, particularly in time series generation. While our prior work SDformer established the first DTM-based framework to achieve state-of-the-art performance in this domain, two critical limitations persist in existing DTM approaches: 1) their inability to capture multi-scale temporal patterns inherent to complex time series data, and 2) the absence of theoretical foundations to guide model optimization. To address these challenges, we proposes a novel multi-scale DTM-based time series generation method, called Multi-Scale Discrete Transformer (MSDformer). MSDformer employs a multi-scale time series tokenizer to learn discrete token representations at multiple scales, which jointly characterize the complex nature of time series data. Subsequently, MSDformer applies a multi-scale autoregressive token modeling technique to capture the multi-scale patterns of time series within the discrete latent space. Theoretically, we validate the effectiveness of the DTM method and the rationality of MSDformer through the rate-distortion theorem. Comprehensive experiments demonstrate that MSDformer significantly outperforms state-of-the-art methods. Both theoretical analysis and experimental results demonstrate that incorporating multi-scale information and modeling multi-scale patterns can substantially enhance the quality of generated time series in DTM-based approaches. The code will be released upon acceptance.