🤖 AI Summary
Monocular video-based dynamic scene reconstruction often struggles to balance global structural consistency with fine local details due to sparse multi-view cues. This work proposes WebSpline, a novel framework that introduces structure-aware, learnable cubic Hermite splines to model dynamic 3D Gaussian trajectories and explicitly organizes motion structures via a Structure Proxy Graph (SPG). By integrating spatio-temporal neighborhood constraints, point trajectory initialization, and rigidity regularization, WebSpline employs a two-stage optimization strategy that ensures structural coherence while enabling efficient inference. Evaluated on both iPhone-captured and NVIDIA dynamic scene benchmarks, WebSpline substantially outperforms existing methods, achieving state-of-the-art rendering quality and inference speeds over ten times faster than WorldTree.
📝 Abstract
Dynamic scene reconstruction from monocular videos remains highly challenging, as existing methods often struggle to balance global structural coherence and local fine-grained details under limited multi-view cues. To address this challenge, we propose WebSpline, a novel dynamic 3D Gaussian framework that enables structurally coherent and high-fidelity reconstruction from monocular videos with fast rendering. The core of WebSpline is the Structure-Informed Spline (SIS) representation, which models each dynamic Gaussian trajectory using a learnable cubic Hermite spline whose motion is structurally organized with an auxiliary Structural Proxy Graph (SPG). The proposed framework is optimized in two stages: (i) in the first stage, the SPG is initialized from 2D point tracks and refined with temporal rigidity regularization to establish structural coherence for moving objects across the sequence; and (ii) in the second stage, the SIS representation is initialized from the refined SPG and optimized under both spatial and structural neighborhood constraints. At inference, Gaussian motion is obtained solely by evaluating the learned SIS, enabling fast rendering. Extensive experiments on the challenging monocular dynamic scene benchmarks, iPhone and NVIDIA, demonstrate that our WebSpline achieves state-of-the-art rendering quality while rendering over 10 times faster than WorldTree, the second-best method on the iPhone dataset.