IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing dynamic driving scene reconstruction methods rely either on labor-intensive manual trajectory annotation or time-varying representations lacking instance-level disentanglement, hindering robust separation of static and dynamic elements. To address this, we propose the first self-supervised 3D Gaussian splatting framework for dynamic scene reconstruction. Our method models motion of dynamic objects via rigid transformations, and integrates zero-shot language-guided video tracking with LiDAR feature alignment to achieve label-free, instance-level decomposition and motion-consistent optimization. Key technical components include 3D Gaussian Splatting, feature-matching-based pose estimation, co-directional smoothing, and joint optimization. Evaluated on the Waymo Open Dataset, our approach significantly improves reconstruction fidelity and trajectory continuity, enabling high-quality scene decomposition and perception-aware simulation. The framework demonstrates strong generalization capability and scalability for large-scale deployment.

Technology Category

Application Category

📝 Abstract
Reconstructing dynamic driving scenes is essential for developing autonomous systems through sensor-realistic simulation. Although recent methods achieve high-fidelity reconstructions, they either rely on costly human annotations for object trajectories or use time-varying representations without explicit object-level decomposition, leading to intertwined static and dynamic elements that hinder scene separation. We present IDSplat, a self-supervised 3D Gaussian Splatting framework that reconstructs dynamic scenes with explicit instance decomposition and learnable motion trajectories, without requiring human annotations. Our key insight is to model dynamic objects as coherent instances undergoing rigid transformations, rather than unstructured time-varying primitives. For instance decomposition, we employ zero-shot, language-grounded video tracking anchored to 3D using lidar, and estimate consistent poses via feature correspondences. We introduce a coordinated-turn smoothing scheme to obtain temporally and physically consistent motion trajectories, mitigating pose misalignments and tracking failures, followed by joint optimization of object poses and Gaussian parameters. Experiments on the Waymo Open Dataset demonstrate that our method achieves competitive reconstruction quality while maintaining instance-level decomposition and generalizes across diverse sequences and view densities without retraining, making it practical for large-scale autonomous driving applications. Code will be released.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing dynamic driving scenes without human annotations for autonomous systems
Separating intertwined static and dynamic elements in scene reconstruction
Achieving instance-level decomposition with consistent motion trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised 3D Gaussian Splatting framework
Zero-shot language-grounded video tracking
Coordinated-turn smoothing for motion trajectories
🔎 Similar Papers
No similar papers found.