Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing world models suffer from inefficient training and poor generalization due to direct modeling of raw environmental variables (e.g., pixels, physical states). To address this, we propose a multimodal latent-variable world model grounded in the Global Workspace (GW) theory from cognitive science. Our approach performs cross-modal representation learning and “mental simulation” within a high-dimensional joint latent space, leveraging GW mechanisms for cross-modal information broadcasting and adaptive inference under missing modalities. Built upon the Dreamer framework, it integrates multimodal encoders (for images and simulated attributes) with a latent dynamics model. Experiments demonstrate substantial reductions in environment interaction steps and robust policy performance even when one modality is absent—whereas baseline methods fail completely. This work constitutes the first integration of Global Workspace theory into world model architecture, establishing a novel paradigm for interpretable and highly generalizable reinforcement learning planning.

Technology Category

Application Category

📝 Abstract

Humans leverage rich internal models of the world to reason about the future, imagine counterfactuals, and adapt flexibly to new situations. In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions, facilitating planning and generalization. However, typical world models directly operate on the environment variables (e.g. pixels, physical attributes), which can make their training slow and cumbersome; instead, it may be advantageous to rely on high-level latent dimensions that capture relevant multimodal variables. Global Workspace (GW) Theory offers a cognitive framework for multimodal integration and information broadcasting in the brain, and recent studies have begun to introduce efficient deep learning implementations of GW. Here, we evaluate the capabilities of an RL system combining GW with a world model. We compare our GW-Dreamer with various versions of the standard PPO and the original Dreamer algorithms. We show that performing the dreaming process (i.e., mental simulation) inside the GW latent space allows for training with fewer environment steps. As an additional emergent property, the resulting model (but not its comparison baselines) displays strong robustness to the absence of one of its observation modalities (images or simulation attributes). We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.

Problem

Research questions and friction points this paper is trying to address.

Enhance RL training efficiency using high-level latent dimensions.

Improve robustness in RL with missing observation modalities.

Combine Global Workspace Theory with World Models for better decision-making.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Global Workspace with World Models

Uses high-level latent dimensions for efficiency

Enhances robustness to missing observation modalities

🔎 Similar Papers

No similar papers found.

Authors to Follow