Inter-environmental world modeling for continuous and compositional dynamics

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of cross-environment continual dynamic modeling and zero-shot action adaptation. We propose WLA, the first continuous latent action representation framework grounded in Lie group theory. WLA abandons discrete action assumptions and instead models action dynamics via Lie group manifolds to achieve semantic disentanglement of actions and joint representation learning across environments. Integrated with an object-centric autoencoder and unsupervised continuous action learning, it requires only raw video frames for training—no action labels or environment-specific supervision. Evaluated on both synthetic and real-world datasets, WLA demonstrates significantly improved cross-environment generalization, enables rapid adaptation to unseen environments and novel action classes, and reduces reliance on action labels to near zero. To our knowledge, WLA is the first method to unify high controllability, strong predictive accuracy, and robust transferability in a single continuous action representation framework.

Technology Category

Application Category

📝 Abstract

Various world model frameworks are being developed today based on autoregressive frameworks that rely on discrete representations of actions and observations, and these frameworks are succeeding in constructing interactive generative models for the target environment of interest. Meanwhile, humans demonstrate remarkable generalization abilities to combine experiences in multiple environments to mentally simulate and learn to control agents in diverse environments. Inspired by this human capability, we introduce World modeling through Lie Action (WLA), an unsupervised framework that learns continuous latent action representations to simulate across environments. WLA learns a control interface with high controllability and predictive ability by simultaneously modeling the dynamics of multiple environments using Lie group theory and object-centric autoencoder. On synthetic benchmark and real-world datasets, we demonstrate that WLA can be trained using only video frames and, with minimal or no action labels, can quickly adapt to new environments with novel action sets.

Problem

Research questions and friction points this paper is trying to address.

Develops continuous latent action representations for cross-environment simulation.

Enhances controllability and predictive ability in multi-environment dynamics modeling.

Adapts to new environments using minimal or no action labels.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised framework for continuous latent actions

Uses Lie group theory for multi-environment dynamics

Object-centric autoencoder enhances predictive control

🔎 Similar Papers

No similar papers found.

Authors to Follow