FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of high computational cost, excessive memory consumption, and limited prediction accuracy faced by existing methods when performing long-horizon forecasting on large-scale spatiotemporal graphs. To this end, we propose FaST, a novel framework that introduces an adaptive graph agent attention mechanism to substantially reduce computational complexity. Furthermore, FaST incorporates a parallel Mixture-of-Experts (MoE) module based on Gated Linear Units (GLUs) to enhance model scalability and efficiency. The proposed approach enables efficient forecasting on graphs with thousands of nodes over horizons as long as 672 time steps (one week). Extensive experiments on multiple real-world datasets demonstrate that FaST significantly outperforms state-of-the-art methods, achieving high prediction accuracy while markedly reducing both computational and memory overhead.

Technology Category

Application Category

📝 Abstract
Spatial-Temporal Graph (STG) forecasting on large-scale networks has garnered significant attention. However, existing models predominantly focus on short-horizon predictions and suffer from notorious computational costs and memory consumption when scaling to long-horizon predictions and large graphs. Targeting the above challenges, we present FaST, an effective and efficient framework based on heterogeneity-aware Mixture-of-Experts (MoEs) for long-horizon and large-scale STG forecasting, which unlocks one-week-ahead (672 steps at a 15-minute granularity) prediction with thousands of nodes. FaST is underpinned by two key innovations. First, an adaptive graph agent attention mechanism is proposed to alleviate the computational burden inherent in conventional graph convolution and self-attention modules when applied to large-scale graphs. Second, we propose a new parallel MoE module that replaces traditional feed-forward networks with Gated Linear Units (GLUs), enabling an efficient and scalable parallel structure. Extensive experiments on real-world datasets demonstrate that FaST not only delivers superior long-horizon predictive accuracy but also achieves remarkable computational efficiency compared to state-of-the-art baselines. Our source code is available at: https://github.com/yijizhao/FaST.
Problem

Research questions and friction points this paper is trying to address.

long-horizon forecasting
spatial-temporal graph
computational efficiency
large-scale networks
memory consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
Spatial-Temporal Graph
Long-Horizon Forecasting
Graph Attention
Gated Linear Units
🔎 Similar Papers
No similar papers found.
Y
Yiji Zhao
Yunnan University, School of Information Science and Engineering, Kunming, China
Z
Zihao Zhong
Yunnan University, School of Information Science and Engineering, Kunming, China
A
Ao Wang
Yunnan University, School of Information Science and Engineering, Kunming, China
Haomin Wen
Haomin Wen
Carnegie Mellon University
Data MiningUrban ComputingSpatio-Temporal Data MiningFoundation Model
Ming Jin
Ming Jin
Assistant Professor, School of ICT, Griffith University
Machine LearningTime SeriesGraph Data MiningMultimodal Learning
Yuxuan Liang
Yuxuan Liang
Assistant Professor, Hong Kong University of Science and Technology (Guangzhou)
Spatio-Temporal Data MiningUrban ComputingUrban AIFoundation ModelsTime Series
H
Huaiyu Wan
Beijing Jiaotong University, Beijing, China
H
Hao Wu
Yunnan University, School of Information Science and Engineering, Kunming, China