🤖 AI Summary
This work addresses the joint estimation of object geometry and governing physical laws from monocular video, without prior knowledge of material properties. We propose the first material-agnostic visual system identification framework: it replaces hand-crafted physical equations with a learnable neural constitutive model and jointly optimizes via differentiable rendering and continuum physics simulation. To enhance state inference stability and accuracy, we introduce a dense geometric guidance mechanism that leverages particle trajectory reconstruction and motion constraints. Our method requires no predefined material parameters and achieves state-of-the-art performance in geometric reconstruction fidelity, synthetic image quality, and cross-material generalization. It significantly advances end-to-end, interpretable physical modeling for complex dynamic scenes.
📝 Abstract
System identification from videos aims to recover object geometry and governing physical laws. Existing methods integrate differentiable rendering with simulation but rely on predefined material priors, limiting their ability to handle unknown ones. We introduce MASIV, the first vision-based framework for material-agnostic system identification. Unlike existing approaches that depend on hand-crafted constitutive laws, MASIV employs learnable neural constitutive models, inferring object dynamics without assuming a scene-specific material prior. However, the absence of full particle state information imposes unique challenges, leading to unstable optimization and physically implausible behaviors. To address this, we introduce dense geometric guidance by reconstructing continuum particle trajectories, providing temporally rich motion constraints beyond sparse visual cues. Comprehensive experiments show that MASIV achieves state-of-the-art performance in geometric accuracy, rendering quality, and generalization ability.