Learning Action-Conditional and Object-Centric Gaussian Splatting World Models for Rigid Objects

πŸ“… 2026-06-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

178K/year
πŸ€– AI Summary
This work addresses the challenge of accurately predicting the future states of occluded or partially observed rigid-body objects in 3D environments based on an agent’s actions. To this end, the authors propose an object-centric Gaussian world model that represents each object as a Gaussian distribution in a canonical coordinate frame. The model leverages a spatio-temporal Transformer to capture action-conditioned rigid-body dynamics and is trained using multi-view reconstruction combined with Gaussian splatting. Evaluated on synthetic indoor scenes, the approach achieves high-fidelity multi-step state predictions and demonstrates effective performance in non-prehensile robotic manipulation tasks in simulation, successfully handling complex occlusions and multi-object interactions.
πŸ“ Abstract
World models enable intelligent agents to predict the consequences of their actions on the environment. In this paper, we propose Multi Rigid Object Gaussian World Model (MRO-GWM), a novel model that learns action-conditional dynamics of rigid objects in 3D. By representing the scene by object-centric Gaussians, we can represent arbitrary object shapes and multi-object scenes. We develop a novel spatio-temporal transformer architecture that predicts future rigid body motion from a history of object Gaussians and future actions. Objects are represented by their Gaussians in a canonical frame, which allows for describing object motion as rigid body transformation. Our model is trained on reconstructions from multiple viewpoints, which requires the model to handle partial observations of objects due to occlusions. We analyze prediction performance of our approach on synthetic datasets composed of typical household objects with multi-object dynamics and interactions by a robot end effector. We also evaluate our model in model-predictive control for non-prehensile manipulation in simulation.
Problem

Research questions and friction points this paper is trying to address.

world models
rigid objects
action-conditional dynamics
object-centric representation
3D scene prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
Object-Centric Representation
Rigid Body Dynamics
Spatio-Temporal Transformer
World Model
πŸ”Ž Similar Papers
No similar papers found.