PLUME: Probabilistic Latent Unified World Modeling and Parameter Estimation for Multi-Finger Manipulation

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that dexterous multi-fingered manipulation is highly sensitive to physical parameters such as object shape, pose, and friction, causing policies trained in simulation to degrade when deployed on real hardware due to unknown dynamics. To overcome this, the authors propose PLUME, a method that employs a probabilistic latent-variable world model to jointly learn a belief over physical parameters and parameter-conditioned dynamics. By embedding diverse physical parameters and rewards into a unified latent space, PLUME enables online Bayesian inference of environment parameters and aligns the dynamics model accordingly—without requiring policy fine-tuning or retraining. The approach achieves zero-shot transfer across simulated tasks including screwdriver turning, valve rotation, bucket lifting, and dial manipulation, and significantly outperforms offline reinforcement learning and world-model-augmented behavioral cloning baselines on a real-world screwdriver task.
📝 Abstract
Dexterous manipulation with multi-finger hands can be sensitive to physical parameters such as object shape, pose, and friction coefficients. While simulation enables large-scale data collection with known parameter values, simulation-trained policies must still handle uncertainty at deployment, where the true parameters and therefore the true dynamics are unknown. Standard domain randomization strategies may be insufficient for precise tasks like screwdriver turning, as manipulation strategies may need to change depending on specific parameter values. To address this, we propose Probabilistic Latent Unified world Modeling and parameter Estimation (PLUME), a world model that jointly learns to evolve a belief over parameter values as well as the system dynamics conditioned on those parameters. We learn a latent space to jointly represent multiple qualitatively different physical parameters along with rewards, themselves functions of partially-observable variables, to inform planning. Our novel learning framework leads to efficient alignment of the world model to true dynamics through online parameter inference as opposed to re-training or fine-tuning. We evaluate our method on simulated screwdriver turning, valve turning, bucket lifting, and disk flicking tasks, as well as a hardware screwdriver turning task, where we achieve successful zero-shot transfer of our simulation-trained policy and outperform state-of-the-art offline reinforcement learning and world-model-augmented behavior cloning baselines. Please see our website at https://plume-world-model.github.io for videos.
Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation
physical parameter uncertainty
multi-finger hands
simulation-to-reality transfer
parameter estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

world model
parameter estimation
dexterous manipulation
online inference
zero-shot transfer