Exploring the Role of Token in Transformer-based Time Series Forecasting

📅 2024-04-16
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing Transformer-based time series forecasting models neglect topological structure preservation of input tokens across intermediate layers, leading to degradation of positional and semantic information and consequent accuracy loss. Method: The authors first observe that prediction gradients concentrate predominantly on a sparse set of “forward tokens” and reveal that standard positional encoding decays severely with network depth, impairing positional discrimination. To address this, they propose a dual positional encoding scheme—temporal PE (T-PE) and variable PE (V-PE)—integrated within a two-branch architecture, T2B-PE, enabling adaptive modeling of positional information along both temporal and variable dimensions. Contribution/Results: Theoretical gradient analysis and extensive experiments on multiple benchmarks demonstrate significant improvements in forecasting accuracy and robustness. The results empirically validate the efficacy of token-aware positional modeling, establishing a principled approach to preserving structural fidelity in deep Transformer encoders for time series forecasting.

Technology Category

Application Category

📝 Abstract
Transformer-based methods are a mainstream approach for solving time series forecasting (TSF). These methods use temporal or variable tokens from observable data to make predictions. However, most focus on optimizing the model structure, with few studies paying attention to the role of tokens for predictions. The role is crucial since a model that distinguishes useful tokens from useless ones will predict more effectively. In this paper, we explore this issue. Through theoretical analyses, we find that the gradients mainly depend on tokens that contribute to the predicted series, called positive tokens. Based on this finding, we explore what helps models select these positive tokens. Through a series of experiments, we obtain three observations: i) positional encoding (PE) helps the model identify positive tokens; ii) as the network depth increases, the PE information gradually weakens, affecting the model's ability to identify positive tokens in deeper layers; iii) both enhancing PE in the deeper layers and using semantic-based PE can improve the model's ability to identify positive tokens, thus boosting performance. Inspired by these findings, we design temporal positional encoding (T-PE) for temporal tokens and variable positional encoding (V-PE) for variable tokens. To utilize T-PE and V-PE, we propose T2B-PE, a Transformer-based dual-branch framework. Extensive experiments demonstrate that T2B-PE has superior robustness and effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Transformers degrade token-level topology in time series forecasting
Preserving topological structures tightens generalization bounds theoretically
Proposing plug-and-play method to enhance positional and semantic topology
Innovation

Methods, ideas, or system contributions that make the work stand out.

TEM preserves token-level topology in Transformers
Uses positional and semantic enhancement modules
Employs bi-level optimization for adaptive integration
🔎 Similar Papers
No similar papers found.
Jianqi Zhang
Jianqi Zhang
Institute of Software, Chinese Academy of Sciences
J
Jingyao Wang
University of Chinese Academy of Sciences, Beijing, China; National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, Beijing, China
C
Chuxiong Sun
X
Xingchen Shen
F
Fanjiang Xu
University of Chinese Academy of Sciences, Beijing, China; National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, Beijing, China
Changwen Zheng
Changwen Zheng
中国科学院软件研究所
机器学习、计算机仿真
Wenwen Qiang
Wenwen Qiang
Institute of Software, Chinese Academy of Sciences
Artificial IntelligenceMachine LearningCausal InferenceLLM/MLLM