🤖 AI Summary
Addressing the dual challenges of inherent stochasticity and non-differentiable evaluation metrics in physical spatiotemporal forecasting, this paper proposes a novel model-based reinforcement learning paradigm that reformulates prediction as sequential planning. Methodologically, we construct a generative world model to simulate high-fidelity, diverse future states and employ domain-specific non-differentiable metrics—such as extreme-event hit rate—as sparse reward signals. We design a beam-search–guided, reward-driven imagination mechanism and introduce an iterative pseudo-labeling self-training strategy. Crucially, our framework enables end-to-end optimization of non-differentiable objectives without gradient approximation. Experiments demonstrate substantial reductions in overall prediction error alongside marked improvements in long-tail event detection. This work establishes a new pathway toward interpretable and robust forecasting for complex physical systems.
📝 Abstract
To address the dual challenges of inherent stochasticity and non-differentiable metrics in physical spatiotemporal forecasting, we propose Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning. SFP constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an "imagination-based" environmental simulation. Within this framework, a base forecasting model acts as an agent, guided by a beam search-based planning algorithm that leverages non-differentiable domain metrics as reward signals to explore high-return future sequences. These identified high-reward candidates then serve as pseudo-labels to continuously optimize the agent's policy through iterative self-training, significantly reducing prediction error and demonstrating exceptional performance on critical domain metrics like capturing extreme events.