Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations

πŸ“… 2025-05-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the challenge of jointly modeling temporal dependencies and cross-variable dependencies in multivariate time series forecasting, this paper proposes a Dual-Path Gated Attention (DPGA) mechanism. First, variable-specific embeddings and temporal-path self-attention independently capture each variable’s temporal dynamics; second, a variable-path self-attention module explicitly models inter-variable correlations, while a lightweight gating module dynamically fuses information from both paths. DPGA decouples yet synergistically optimizes these two dependency types, enabling plug-and-play integration into Transformer-based and large-model forecasting frameworks. Extensive experiments on 13 real-world datasets demonstrate state-of-the-art performance, achieving an average 20.7% reduction in MAE over existing methods. The implementation is publicly available.

Technology Category

Application Category

πŸ“ Abstract
There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer.
Problem

Research questions and friction points this paper is trying to address.

Modeling temporal and variate dependencies in multivariate time series
Integrating cross-time and cross-variate information efficiently
Improving Transformer performance for time series forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses temporal and variate-wise attention mechanisms
Employs gated representations for information flow
Integrates cross-time and cross-variate dependencies effectively
πŸ”Ž Similar Papers
No similar papers found.
Y
Yu-Hsiang Lan
New York University, New York, NY , USA
Anton Alyakin
Anton Alyakin
medical student at washington univesity
llmsneurosurgerynetworkscausality
E
E. Oermann
Department of Neurosurgery, New York University, New York, NY , USA