Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the geometric mismatch between teacher forcing—specifically identity teacher forcing (ITF)—and the free-running marginal likelihood objective when training surrogate models for chaotic systems. The authors propose a probabilistic switching-based approximately linear RNN framework and, leveraging Louis’ identity to estimate the observed information matrix alongside windowed evidence fine-tuning, reinterpret ITF as a generalized Bayesian update. This perspective reveals that ITF overestimates curvature due to its reliance on a single forced trajectory, whereas the marginal likelihood effectively reduces curvature by correcting for missing information. Experiments demonstrate that although windowed fine-tuning improves held-out evidence scores, it can degrade the reconstruction accuracy of key dynamical quantities in the Lorenz-63 system.

📝 Abstract

Identity teacher forcing (ITF) enables stable training of deterministic recurrent surrogates for chaotic dynamical systems and has been highly effective for dynamical systems reconstruction (DSR) with recurrent neural networks (RNNs), including interpretable almost-linear RNNs (AL-RNNs). However, as an intervention-based prediction loss (and thus a generalized Bayes update), teacher forcing need not match the free-running model's marginal likelihood geometry. We compare the objective-induced curvatures of ITF and marginal likelihood in a probabilistic switching augmentation of AL-RNNs, estimating ambiguity-aware observed information via Louis' identity. In the switching setting studied here, conditioning on a single forced regime path (as ITF does) inflates curvature, while marginal likelihood curvature is reduced by a missing-information correction when multiple switching explanations remain plausible. In Lorenz-63 experiments, windowed evidence fine-tuning improves held-out evidence but can degrade dynamical quantities of interest (QoIs) relative to ITF-pretrained models.

Problem

Research questions and friction points this paper is trying to address.

teacher forcing

chaotic dynamics

optimization geometry mismatch

marginal likelihood

dynamical systems reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

teacher forcing

generalized Bayes

chaotic dynamics