When Do Autoregressive Sequence Models Forecast Physical Wavefields? A Controlled Study on Synthetic Seismograms

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the instability of autoregressive models in long-horizon forecasting of oscillatory physical wavefields—such as seismograms—where error accumulation induces phase drift during rolling predictions. Using a three-component synthetic seismogram benchmark, the authors conduct controlled ablation experiments on SeismoGPT to systematically evaluate mechanisms underlying multi-step rolling prediction stability. They find that multi-token prediction is crucial for stability, that context length must exceed the P–S wave interval threshold to enable effective generalization, and that existing spectral losses fail to correct polarity reversals. The proposed approach integrates multi-token prediction, a hybrid-horizon embedding prediction head, and a cross-horizon STFT magnitude coherence loss, evaluated via free-running rollouts and paired significance testing. Results show that multi-token prediction improves median NCC by 0.040, other components yield consistent minor gains, and insufficient context leads to sharp performance degradation.

📝 Abstract

Long-horizon autoregressive forecasting of oscillatory physical signals, such as seismograms, gravitational-wave strain, and similar wavefields is limited by error accumulation: as a causal model is fed its own outputs over hundreds of steps, small per-step errors compound into phase drift that pointwise metrics fail to detect. We ask when such rollout stays stable, using synthetic three-component seismograms as a physically structured testbed and the \textsc{SeismoGPT} autoregressive forecaster as the model under study. Through controlled, intra-architecture ablations evaluated on free-running rollout with paired significance tests, we isolate the contribution of each design choice. Multi-token prediction is the dominant stabilizer, accounting for almost the entire improvement over a single-token baseline ($+0.040$ median NCC); a horizon-embedding hybrid prediction head and a cross-horizon STFT-magnitude coherence loss each add a small but consistent further gain. Performance depends sharply on a context-ratio threshold near one, roughly the full P-S interval of observed signal, below which rollout generalization collapses. The dominant residual failure is a polarity inversion that a magnitude-based spectral loss cannot, by construction, penalize, identifying phase-aware objectives as the natural next step. We frame this as a controlled study of rollout stability on oscillatory wavefields, not a benchmark of forecasting architectures.

Problem

Research questions and friction points this paper is trying to address.

autoregressive forecasting

wavefields

error accumulation

phase drift

rollout stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive forecasting

rollout stability

multi-token prediction