🤖 AI Summary
This work addresses the failure of conventional time-embedding methods in long-horizon extrapolation of dynamic 3D scenes for unseen future timestamps. We propose a novel framework coupling 3D Gaussian splatting with latent-space neural ordinary differential equations (Neural ODEs). Specifically, a Transformer encodes historical trajectories into a continuous, differentiable latent state; a Neural ODE governs its physically plausible temporal evolution, enhanced by second-order derivative regularization and a variational objective to improve generalization. During rendering, numerical integration enables real-time decoding of Gaussian parameters at arbitrary future timestamps, followed by efficient differentiable rendering. Our method achieves state-of-the-art performance on D-NeRF and NVFI benchmarks: PSNR improves by up to 10 dB, and LPIPS decreases by 50%. To our knowledge, it is the first approach enabling millisecond-latency, high-fidelity, arbitrarily long-horizon dynamic 3D scene extrapolation and rendering.
📝 Abstract
We present ODE-GS, a novel method that unifies 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to forecast dynamic 3D scenes far beyond the time span seen during training. Existing neural rendering systems - whether NeRF- or 3DGS-based - embed time directly in a deformation network and therefore excel at interpolation but collapse when asked to predict the future, where timestamps are strictly out-of-distribution. ODE-GS eliminates this dependency: after learning a high-fidelity, time-conditioned deformation model for the training window, we freeze it and train a Transformer encoder that summarizes past Gaussian trajectories into a latent state whose continuous evolution is governed by a neural ODE. Numerical integration of this latent flow yields smooth, physically plausible Gaussian trajectories that can be queried at any future instant and rendered in real time. Coupled with a variational objective and a lightweight second-derivative regularizer, ODE-GS attains state-of-the-art extrapolation on D-NeRF and NVFI benchmarks, improving PSNR by up to 10 dB and halving perceptual error (LPIPS) relative to the strongest baselines. Our results demonstrate that continuous-time latent dynamics are a powerful, practical route to photorealistic prediction of complex 3D scenes.