Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the limitation of existing end-to-end autonomous driving approaches in explicitly modeling action-conditioned dynamics, which hinders causal reasoning and counterfactual inference. The authors propose a unified discrete visual-action token representation that aligns future visual observations and ego-vehicle actions into a shared discrete latent space. Built upon this representation, a diffusion-based generative framework jointly performs world modeling, policy learning, and hierarchical decision-making. The approach enables compositional generalization across scenes, controllable generation, and counterfactual reasoning. Evaluated on large-scale autonomous driving benchmarks, it achieves competitive performance while demonstrating robust and interpretable decision-making capabilities.
📝 Abstract
Autonomous driving requires reasoning about how ego actions shape the evolution of the surrounding world. However, most end-to-end methods rely on direct state-to-action mappings, capturing correlations without explicitly modeling action-conditioned dynamics. Conversely, continuous-latent world models often lack compositional structure for causal reasoning across counterfactual futures. We introduce Discrete-WAM, a unified latent vision-action world policy that represents future visual states and ego actions as aligned discrete tokens, enabling compositional causal reasoning across alternative futures. Built upon this unified discrete alignment, Discrete-WAM establishes a shared discrete diffusion framework with unified generative tasks, jointly formulating world modeling, world-action policy, and hierarchical decision-enabled policy, supporting compositional generalization across diverse driving scenarios. Experiments on large-scale autonomous-driving benchmarks show that Discrete-WAM achieves competitive performance while supporting controllable generation and counterfactual reasoning, offering a principled path toward more reliable decision-making.
Problem

Research questions and friction points this paper is trying to address.

world modeling
action-conditioned dynamics
counterfactual reasoning
compositional generalization
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete token
world-action modeling
compositional reasoning
diffusion framework
counterfactual reasoning