$\texttt{WEAVER}$, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

📅 2026-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing world models struggle to simultaneously achieve high fidelity, long-term consistency, and computational efficiency, limiting their applicability in robotic manipulation. This work proposes WEAVER—a multi-view world model that, for the first time, jointly optimizes model architecture, memory mechanisms, and prediction objectives within a unified framework. By employing flow matching loss in latent space, WEAVER concurrently predicts future states and rewards. Evaluated on real-world robotic tasks, the method achieves a success-rate correlation of 0.870, improves policy transfer success by 38%, and accelerates test-time planning by 5–10× while increasing success rates by 14%. Furthermore, it demonstrates superior robustness in out-of-distribution scenarios, substantially overcoming performance bottlenecks of world models in complex manipulation tasks.
📝 Abstract
The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: $\textit{(i)}$ fidelity (i.e., producing simulated trajectories that correlate with reality), $\textit{(ii)}$ consistency (i.e., producing simulated trajectories that are coherent over long horizons), and $\textit{(iii)}$ efficiency (i.e., producing simulated trajectories quickly). We propose $\texttt{WEAVER}$ (World Estimation Across Views for Embodied Reasoning): a WM architecture that simultaneously achieves all three desiderata, providing state-of-the-art results on robotic manipulation tasks. $\texttt{WEAVER}$ is a multi-view WM trained to predict future latents and reward values via a flow-matching loss. We distill the key design decisions across model architecture, memory, and prediction objectives required to unlock the kinds of long-horizon dynamic manipulation tasks that have confounded prior world modeling approaches. We apply $\texttt{WEAVER}$ in robotic hardware, demonstrating its effectiveness at policy evaluation ($ρ$=0.870 correlation with real-world success rate), policy improvement (real-world success rate improvement of $38\%$ on top of the $π_{0.5}$ robot foundation model), and test-time planning (real-world success rate improvement of $14\%$ with a $5-10\times$ speedup over prior WMs). $\texttt{WEAVER}$ also demonstrates better performance than prior WMs when evaluated on out-of-distribution scenarios. Code, models, and videos at: https://arnavkj1995.github.io/WEAVER/ .
Problem

Research questions and friction points this paper is trying to address.

world models
robotic manipulation
fidelity
consistency
efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

world model
flow-matching
multi-view prediction
long-horizon consistency
robotic manipulation