MTM: A Multi-Scale Token Mixing Transformer for Irregular Multivariate Time Series Classification

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the classification of irregular multivariate time series (IMTS), where asynchronous channel observations impede effective modeling, this paper proposes the Multi-scale Token Mixing Transformer (MTM). MTM mitigates asynchrony via multi-scale downsampling, employs masked concatenation pooling to preserve critical temporal structures, and introduces cross-channel token mixing alongside channel-wise attention to explicitly enhance dynamic inter-channel interactions. Evaluated on multiple real-world IMTS benchmarks, MTM achieves significant performance gains over state-of-the-art methods, with up to a 3.8% improvement in AUPRC—setting new SOTA results. Its core contribution lies in the first integration of multi-scale token mixing with masked pooling for IMTS modeling, effectively resolving the challenge of collaborative learning across asynchronously observed channels.

Technology Category

Application Category

📝 Abstract

Irregular multivariate time series (IMTS) is characterized by the lack of synchronized observations across its different channels. In this paper, we point out that this channel-wise asynchrony can lead to poor channel-wise modeling of existing deep learning methods. To overcome this limitation, we propose MTM, a multi-scale token mixing transformer for the classification of IMTS. We find that the channel-wise asynchrony can be alleviated by down-sampling the time series to coarser timescales, and propose to incorporate a masked concat pooling in MTM that gradually down-samples IMTS to enhance the channel-wise attention modules. Meanwhile, we propose a novel channel-wise token mixing mechanism which proactively chooses important tokens from one channel and mixes them with other channels, to further boost the channel-wise learning of our model. Through extensive experiments on real-world datasets and comparison with state-of-the-art methods, we demonstrate that MTM consistently achieves the best performance on all the benchmarks, with improvements of up to 3.8% in AUPRC for classification.

Problem

Research questions and friction points this paper is trying to address.

Classifying irregular multivariate time series with unsynchronized channel observations

Addressing poor channel-wise modeling caused by asynchrony in existing methods

Enhancing channel-wise attention and token mixing for improved classification performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale token mixing transformer for irregular time series

Masked concat pooling gradually down-samples time series

Channel-wise token mixing mechanism boosts inter-channel learning

🔎 Similar Papers

Multiple-Resolution Tokenization for Time Series Forecasting with an Application to Pricing

2024-07-03arXiv.orgCitations: 0

Authors to Follow