π€ AI Summary
Addressing the challenge of jointly modeling temporal dependencies and cross-variable dependencies in multivariate time series forecasting, this paper proposes a Dual-Path Gated Attention (DPGA) mechanism. First, variable-specific embeddings and temporal-path self-attention independently capture each variableβs temporal dynamics; second, a variable-path self-attention module explicitly models inter-variable correlations, while a lightweight gating module dynamically fuses information from both paths. DPGA decouples yet synergistically optimizes these two dependency types, enabling plug-and-play integration into Transformer-based and large-model forecasting frameworks. Extensive experiments on 13 real-world datasets demonstrate state-of-the-art performance, achieving an average 20.7% reduction in MAE over existing methods. The implementation is publicly available.
π Abstract
There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer.