Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Standard Transformers’ fully connected attention mechanism neglects the inherent causality and locality of time series, limiting predictive performance. To address this, we propose Weighted Causal Attention (WCA), a novel attention mechanism that introduces a learnable weight function based on smooth heavy-tailed decay—thereby encoding temporal locality as an end-to-end differentiable inductive bias. WCA integrates strict causal masking with principled power-law decay, yielding a Transformer variant that balances architectural flexibility with interpretability. Evaluated across multiple mainstream time-series forecasting benchmarks, our approach achieves state-of-the-art accuracy. Moreover, the learned attention weights exhibit clear, monotonic temporal decay patterns—empirically confirming that explicit temporal priors enhance both model performance and interpretability.

Technology Category

Application Category

📝 Abstract

Transformers have recently shown strong performance in time-series forecasting, but their all-to-all attention mechanism overlooks the (temporal) causal and often (temporally) local nature of data. We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. This simple yet effective modification endows the model with an inductive bias favoring temporally local dependencies, while still allowing sufficient flexibility to learn the unique correlation structure of each dataset. Our empirical results demonstrate that Powerformer not only achieves state-of-the-art accuracy on public time-series benchmarks, but also that it offers improved interpretability of attention patterns. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention. These findings highlight the importance of domain-specific modifications to the Transformer architecture for time-series forecasting, and they establish Powerformer as a strong, efficient, and principled baseline for future research and real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Addresses causal and local nature in time-series data

Introduces weighted causal attention for enhanced forecasting

Improves interpretability and accuracy in time-series models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal attention weights

Heavy-tailed decay

Locality bias enhancement

🔎 Similar Papers

A Transformer approach for Electricity Price Forecasting