HiMem-WAM: Hierarchical Memory-Gated World Action Models for Robotic Manipulation

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing world action models struggle to effectively leverage task-relevant memory for long-horizon robotic manipulation. This work proposes a hierarchical memory-gated world action model that jointly learns latent variables representing low-level motions and high-level skills, and triggers compact task-state writes at predicted skill boundaries, enabling causal reasoning without requiring future video generation or optical flow estimation. The approach introduces a hierarchical latent action structure coupled with a boundary-aware memory gating mechanism, facilitating structured temporal abstraction and efficient memory updates. Experimental results demonstrate that the model significantly improves robustness under deployment perturbations and enhances long-horizon manipulation performance on LIBERO, LIBERO-PLUS, RMBench, and real-world tasks.

📝 Abstract

World Action Models (WAMs) have emerged as a new powerful paradigm for embodied intelligence, learning action-relevant visual dynamics that significantly enhance generalization and robustness. However, existing WAMs still struggle with task-relevant memory in long-horizon robotic manipulation. To address this, we present HiMem-WAM, a Hierarchical Memory-Gated WAM that integrates motion-centric latent actions, high-level skill latents, and boundary-triggered memory updates. Specifically, we develop a hierarchical latent action framework that jointly learns low-level motion and high-level skill latents, providing structured temporal abstraction. Meanwhile, a boundary-aware memory gate writes compact task states at predicted skill transitions, enabling causal inference without test-time generation of future video or optical flow estimation. Evaluated on LIBERO, LIBERO-PLUS, RMBench and real-world tasks, HiMem-WAM shows that hierarchical latents improve robustness under deployment perturbations, and the memory module substantially benefits memory-dependent long-horizon manipulation.

Problem

Research questions and friction points this paper is trying to address.

World Action Models

long-horizon manipulation

task-relevant memory

robotic manipulation

memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Latent Actions

Memory-Gated WAM

Skill Transition Boundaries