π€ AI Summary
Transformer encoders and graph convolutional networks (GCNs) have been empirically applied to time series modeling, yet their theoretical relationship remains unclear. Method: This paper establishes a rigorous theoretical equivalence between Transformer encoders and multi-hop GCNs with dynamic adjacency matrices, showing that self-attention is mathematically equivalent to graph convolution over a time-varying graph topology. Building on this insight, we propose Fighterβa streamlined architecture that eliminates redundant linear projections, explicitly constructs dynamic adjacency matrices from attention weights, and models multi-scale temporal dependencies via multi-hop graph aggregation. Contribution/Results: Fighter is the first model to formally bridge Transformers and GCNs under a unified graph-theoretic framework, significantly enhancing interpretability. It achieves state-of-the-art or highly competitive performance across multiple standard time series forecasting benchmarks, demonstrating that structural simplification and mechanistic clarity jointly improve both transparency and predictive accuracy.
π Abstract
Transformers have achieved remarkable success in time series modeling, yet their internal mechanisms remain opaque. This work demystifies the Transformer encoder by establishing its fundamental equivalence to a Graph Convolutional Network (GCN). We show that in the forward pass, the attention distribution matrix serves as a dynamic adjacency matrix, and its composition with subsequent transformations performs computations analogous to graph convolution. Moreover, we demonstrate that in the backward pass, the update dynamics of value and feed-forward projections mirror those of GCN parameters. Building on this unified theoretical reinterpretation, we propose extbf{Fighter} (Flexible Graph Convolutional Transformer), a streamlined architecture that removes redundant linear projections and incorporates multi-hop graph aggregation. This perspective yields an explicit and interpretable representation of temporal dependencies across different scales, naturally expressed as graph edges. Experiments on standard forecasting benchmarks confirm that Fighter achieves competitive performance while providing clearer mechanistic interpretability of its predictions.