Physics from Video: Identifiability of Time-Invariant Second-Order ODEs under Minimal Trajectory Conditions

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

This work proposes an encoder-only, end-to-end framework that directly identifies the dynamical parameters of continuous-time physical systems—modeled as second-order linear ordinary differential equations—from raw video pixels, unifying visual perception with physical understanding. Grounded in structural identifiability theory, the method introduces a slope-covering condition to ensure local affine equivalence between the latent space and true physical states, while employing a decoder-free objective and a variance lower-bound regularizer to prevent latent variable collapse. Theoretical analysis establishes, for the first time, the minimal number of trajectories required for parameter identifiability under different damping regimes: a single video suffices for underdamped systems, whereas three diverse trajectories are necessary otherwise. Experiments on both synthetic and real-world data demonstrate that the model accurately estimates interpretable physical constants without requiring pixel-level reconstruction, achieving both physical fidelity and model transparency.

📝 Abstract

Bridging the gap between visual realism and physical understanding is a core challenge for video-based world models. We study the structural identifiability of continuous-time physical laws from raw pixels, focusing on whether an encoder-only pipeline can uniquely recover the parameters of second-order linear ODEs. We prove that a level-set slope-coverage condition ensures the learned latent space is locally affine to the true physical state, enabling exact parameter recovery. Our theory provides the first characterization of minimal data requirements across damping regimes, establishing that underdamped systems are identifiable from a single video clip, whereas other regimes require three diverse trajectories. We further introduce a variance-floor regularizer to stabilize the decoder-free objective and prevent latent collapse. Validated on synthetic and real-world data, our approach demonstrates that interpretable physical constants can be reliably estimated from video without the need for compute-intensive pixel reconstruction, ensuring both physical correctness and transparency. Code is available at https://github.com/wenjiewang3/PhysicsFromVideo.

Problem

Research questions and friction points this paper is trying to address.

structural identifiability

second-order ODEs

video-based physics

latent space

parameter recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

structural identifiability

second-order ODEs

encoder-only video modeling