🤖 AI Summary
Existing world models struggle to simultaneously achieve high fidelity, long-term consistency, and computational efficiency, limiting their applicability in robotic manipulation. This work proposes WEAVER—a multi-view world model that, for the first time, jointly optimizes model architecture, memory mechanisms, and prediction objectives within a unified framework. By employing flow matching loss in latent space, WEAVER concurrently predicts future states and rewards. Evaluated on real-world robotic tasks, the method achieves a success-rate correlation of 0.870, improves policy transfer success by 38%, and accelerates test-time planning by 5–10× while increasing success rates by 14%. Furthermore, it demonstrates superior robustness in out-of-distribution scenarios, substantially overcoming performance bottlenecks of world models in complex manipulation tasks.
📝 Abstract
The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: $\textit{(i)}$ fidelity (i.e., producing simulated trajectories that correlate with reality), $\textit{(ii)}$ consistency (i.e., producing simulated trajectories that are coherent over long horizons), and $\textit{(iii)}$ efficiency (i.e., producing simulated trajectories quickly). We propose $\texttt{WEAVER}$ (World Estimation Across Views for Embodied Reasoning): a WM architecture that simultaneously achieves all three desiderata, providing state-of-the-art results on robotic manipulation tasks. $\texttt{WEAVER}$ is a multi-view WM trained to predict future latents and reward values via a flow-matching loss. We distill the key design decisions across model architecture, memory, and prediction objectives required to unlock the kinds of long-horizon dynamic manipulation tasks that have confounded prior world modeling approaches. We apply $\texttt{WEAVER}$ in robotic hardware, demonstrating its effectiveness at policy evaluation ($ρ$=0.870 correlation with real-world success rate), policy improvement (real-world success rate improvement of $38\%$ on top of the $π_{0.5}$ robot foundation model), and test-time planning (real-world success rate improvement of $14\%$ with a $5-10\times$ speedup over prior WMs). $\texttt{WEAVER}$ also demonstrates better performance than prior WMs when evaluated on out-of-distribution scenarios. Code, models, and videos at: https://arnavkj1995.github.io/WEAVER/ .