Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations

📅 2025-05-01

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Addressing the challenge of jointly modeling temporal dependencies and cross-variable dependencies in multivariate time series forecasting, this paper proposes a Dual-Path Gated Attention (DPGA) mechanism. First, variable-specific embeddings and temporal-path self-attention independently capture each variable’s temporal dynamics; second, a variable-path self-attention module explicitly models inter-variable correlations, while a lightweight gating module dynamically fuses information from both paths. DPGA decouples yet synergistically optimizes these two dependency types, enabling plug-and-play integration into Transformer-based and large-model forecasting frameworks. Extensive experiments on 13 real-world datasets demonstrate state-of-the-art performance, achieving an average 20.7% reduction in MAE over existing methods. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer.

Problem

Research questions and friction points this paper is trying to address.

Modeling temporal and variate dependencies in multivariate time series

Integrating cross-time and cross-variate information efficiently

Improving Transformer performance for time series forecasting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses temporal and variate-wise attention mechanisms

Employs gated representations for information flow

Integrates cross-time and cross-variate dependencies effectively

🔎 Similar Papers

TiVaT: A Transformer with a Single Unified Mechanism for Capturing Asynchronous Dependencies in Multivariate Time Series Forecasting