🤖 AI Summary
This work addresses the challenging problem of 3D depth reconstruction for transparent objects under sparse-view and dynamic-scene conditions. We propose a physics-informed, object-aware Gaussian splatting method that jointly leverages physical simulation and geometric reasoning. To enhance transparency-aware reconstruction, we employ foreground-background separation to focus optimization on transparent-object regions. A physics-driven scene update mechanism enables robust handling of object removal, coupled motion, and occlusion compensation—without requiring re-scanning. An object-aware loss function is designed and integrated into a 2D Gaussian splatting framework augmented with real-time physics simulation, enabling end-to-end optimization. Evaluated on the synthetic TRansPose dataset, our method reduces mean absolute error (MAE) by over 39% compared to baselines. Notably, it achieves 48.46% accuracy at δ < 2.5 cm using only a single updated image—surpassing the six-image baseline significantly.
📝 Abstract
Understanding the 3D geometry of transparent objects from RGB images is challenging due to their inherent physical properties, such as reflection and refraction. To address these difficulties, especially in scenarios with sparse views and dynamic environments, we introduce TRAN-D, a novel 2D Gaussian Splatting-based depth reconstruction method for transparent objects. Our key insight lies in separating transparent objects from the background, enabling focused optimization of Gaussians corresponding to the object. We mitigate artifacts with an object-aware loss that places Gaussians in obscured regions, ensuring coverage of invisible surfaces while reducing overfitting. Furthermore, we incorporate a physics-based simulation that refines the reconstruction in just a few seconds, effectively handling object removal and chain-reaction movement of remaining objects without the need for rescanning. TRAN-D is evaluated on both synthetic and real-world sequences, and it consistently demonstrated robust improvements over existing GS-based state-of-the-art methods. In comparison with baselines, TRAN-D reduces the mean absolute error by over 39% for the synthetic TRansPose sequences. Furthermore, despite being updated using only one image, TRAN-D reaches a δ < 2.5 cm accuracy of 48.46%, over 1.5 times that of baselines, which uses six images. Code and more results are available at https://jeongyun0609.github.io/TRAN-D/.