🤖 AI Summary
This work addresses the degradation of dynamic 3D Gaussian splatting under full occlusion, where the absence of photometric supervision compromises object permanence. To mitigate this, the authors introduce the first framework that couples differentiable rigid-body physics simulation with 4D Gaussian splatting, estimating friction coefficients and velocities from pre-occlusion trajectories to generate SE(3) motion consistent with rigid-body dynamics during occlusion. A novel centroid silhouette loss is proposed to effectively decouple positional gradients from appearance noise, enabling accurate modeling of contact events such as collisions and friction-induced deceleration. Experiments on synthetic scenes demonstrate that, compared to constant-velocity extrapolation, the method improves PSNR by 2.46 dB and reduces trajectory error by 40%, achieving performance close to the oracle upper bound derived from ground-truth trajectories (within 0.19 dB).
📝 Abstract
Dynamic 3D Gaussian Splatting (3DGS) methods reconstruct time-varying scenes from synchronized multi-camera video using photometric supervision. When a moving object becomes fully occluded from all training cameras, this supervision vanishes: the Gaussians representing it receive no gradient signal and degrade. Existing approaches to incomplete observations in neural reconstruction rely on learned generative priors that prioritize visual plausibility over physical correctness.
We propose $\textbf{PersistGS}$, a method that restores object permanence during occlusion by coupling differentiable rigid body simulation with 3D Gaussian Splatting. Our approach decomposes the scene into per-object Gaussians and collision meshes, estimates friction and velocity from the observed pre-occlusion trajectory via differentiable simulation, and uses the resulting SE(3) trajectory to position object Gaussians throughout the occlusion period. Because the predicted trajectory satisfies the governing equations of rigid body dynamics, it faithfully captures contact events (bounces, friction-based deceleration, direction changes) that kinematic extrapolation cannot model. We introduce a centroid silhouette loss that isolates positional gradients from appearance noise, yielding 40% lower trajectory error than photometric supervision. We evaluate using cameras withheld from training that observe the object during its occlusion. Experiments on synthetic scenes show that PersistGS outperforms constant velocity extrapolation by +2.46dB PSNR and comes within 0.19dB of a ground-truth trajectory upper bound.