MTM: A Multi-Scale Token Mixing Transformer for Irregular Multivariate Time Series Classification

๐Ÿ“… 2025-09-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the classification of irregular multivariate time series (IMTS), where asynchronous channel observations impede effective modeling, this paper proposes the Multi-scale Token Mixing Transformer (MTM). MTM mitigates asynchrony via multi-scale downsampling, employs masked concatenation pooling to preserve critical temporal structures, and introduces cross-channel token mixing alongside channel-wise attention to explicitly enhance dynamic inter-channel interactions. Evaluated on multiple real-world IMTS benchmarks, MTM achieves significant performance gains over state-of-the-art methods, with up to a 3.8% improvement in AUPRCโ€”setting new SOTA results. Its core contribution lies in the first integration of multi-scale token mixing with masked pooling for IMTS modeling, effectively resolving the challenge of collaborative learning across asynchronously observed channels.

Technology Category

Application Category

๐Ÿ“ Abstract
Irregular multivariate time series (IMTS) is characterized by the lack of synchronized observations across its different channels. In this paper, we point out that this channel-wise asynchrony can lead to poor channel-wise modeling of existing deep learning methods. To overcome this limitation, we propose MTM, a multi-scale token mixing transformer for the classification of IMTS. We find that the channel-wise asynchrony can be alleviated by down-sampling the time series to coarser timescales, and propose to incorporate a masked concat pooling in MTM that gradually down-samples IMTS to enhance the channel-wise attention modules. Meanwhile, we propose a novel channel-wise token mixing mechanism which proactively chooses important tokens from one channel and mixes them with other channels, to further boost the channel-wise learning of our model. Through extensive experiments on real-world datasets and comparison with state-of-the-art methods, we demonstrate that MTM consistently achieves the best performance on all the benchmarks, with improvements of up to 3.8% in AUPRC for classification.
Problem

Research questions and friction points this paper is trying to address.

Classifying irregular multivariate time series with unsynchronized channel observations
Addressing poor channel-wise modeling caused by asynchrony in existing methods
Enhancing channel-wise attention and token mixing for improved classification performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale token mixing transformer for irregular time series
Masked concat pooling gradually down-samples time series
Channel-wise token mixing mechanism boosts inter-channel learning
S
Shuhan Zhong
The Hong Kong University of Science and Technology, Hong Kong SAR, China
Weipeng Zhuo
Weipeng Zhuo
Beijing Normal-Hong Kong Baptist University
Internet of ThingsGraph Neural NetworksEdge Computing
S
Sizhe Song
The Hong Kong University of Science and Technology, Hong Kong SAR, China
G
Guanyao Li
Guangzhou Urban Planning and Design Survey Research Institute, Guangzhou, China
Z
Zhongyi Yu
Beijing Normal-Hong Kong Baptist University, Zhuhai, China
S
S. -H. Gary Chan
The Hong Kong University of Science and Technology, Hong Kong SAR, China