Fighter: Unveiling the Graph Convolutional Nature of Transformers in Time Series Modeling

πŸ“… 2025-10-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Transformer encoders and graph convolutional networks (GCNs) have been empirically applied to time series modeling, yet their theoretical relationship remains unclear. Method: This paper establishes a rigorous theoretical equivalence between Transformer encoders and multi-hop GCNs with dynamic adjacency matrices, showing that self-attention is mathematically equivalent to graph convolution over a time-varying graph topology. Building on this insight, we propose Fighterβ€”a streamlined architecture that eliminates redundant linear projections, explicitly constructs dynamic adjacency matrices from attention weights, and models multi-scale temporal dependencies via multi-hop graph aggregation. Contribution/Results: Fighter is the first model to formally bridge Transformers and GCNs under a unified graph-theoretic framework, significantly enhancing interpretability. It achieves state-of-the-art or highly competitive performance across multiple standard time series forecasting benchmarks, demonstrating that structural simplification and mechanistic clarity jointly improve both transparency and predictive accuracy.

Technology Category

Application Category

πŸ“ Abstract
Transformers have achieved remarkable success in time series modeling, yet their internal mechanisms remain opaque. This work demystifies the Transformer encoder by establishing its fundamental equivalence to a Graph Convolutional Network (GCN). We show that in the forward pass, the attention distribution matrix serves as a dynamic adjacency matrix, and its composition with subsequent transformations performs computations analogous to graph convolution. Moreover, we demonstrate that in the backward pass, the update dynamics of value and feed-forward projections mirror those of GCN parameters. Building on this unified theoretical reinterpretation, we propose extbf{Fighter} (Flexible Graph Convolutional Transformer), a streamlined architecture that removes redundant linear projections and incorporates multi-hop graph aggregation. This perspective yields an explicit and interpretable representation of temporal dependencies across different scales, naturally expressed as graph edges. Experiments on standard forecasting benchmarks confirm that Fighter achieves competitive performance while providing clearer mechanistic interpretability of its predictions.
Problem

Research questions and friction points this paper is trying to address.

Reveals Transformer's equivalence to Graph Convolutional Networks in time series
Proposes streamlined architecture removing redundant projections for interpretability
Models temporal dependencies as explicit graph edges across different scales
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer encoder equivalent to Graph Convolutional Network
Attention matrix acts as dynamic adjacency for convolution
Simplified architecture removes projections and adds graph aggregation
πŸ”Ž Similar Papers
No similar papers found.
C
Chen Zhang
The University of Hong Kong
W
Weixin Bu
Reversible Inc
W
Wendong Xu
The University of Hong Kong
Runsheng Yu
Runsheng Yu
Unknown affiliation
Y
Yik-Chung Wu
The University of Hong Kong
N
Ngai Wong
The University of Hong Kong