World Model Agents with Change-Based Intrinsic Motivation

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work addresses insufficient exploration in reinforcement learning under sparse rewards. We introduce Change-Based Exploration Transfer (CBET)—a novel intrinsic motivation mechanism—into world model frameworks (DreamerV3 and IMPALA), and systematically evaluate it on Crafter and MiniGrid. Results show that CBET significantly improves cumulative return for DreamerV3 in the complex, long-horizon Crafter environment, demonstrating its efficacy in modeling extended tasks. Conversely, in the simpler MiniGrid environment, CBET degrades policy performance, revealing a critical principle: intrinsic motivation must be co-designed with task structure. This study extends CBET’s applicability to world models and provides the first empirical evidence of its dual nature—beneficial in complex settings yet potentially detrimental in simpler ones. Our findings offer both theoretical insight and practical guidance for designing adaptive exploration mechanisms tailored to sparse-reward domains.

Technology Category

Application Category

📝 Abstract

Sparse reward environments pose a significant challenge for reinforcement learning due to the scarcity of feedback. Intrinsic motivation and transfer learning have emerged as promising strategies to address this issue. Change Based Exploration Transfer (CBET), a technique that combines these two approaches for model-free algorithms, has shown potential in addressing sparse feedback but its effectiveness with modern algorithms remains understudied. This paper provides an adaptation of CBET for world model algorithms like DreamerV3 and compares the performance of DreamerV3 and IMPALA agents, both with and without CBET, in the sparse reward environments of Crafter and Minigrid. Our tabula rasa results highlight the possibility of CBET improving DreamerV3's returns in Crafter but the algorithm attains a suboptimal policy in Minigrid with CBET further reducing returns. In the same vein, our transfer learning experiments show that pre-training DreamerV3 with intrinsic rewards does not immediately lead to a policy that maximizes extrinsic rewards in Minigrid. Overall, our results suggest that CBET provides a positive impact on DreamerV3 in more complex environments like Crafter but may be detrimental in environments like Minigrid. In the latter case, the behaviours promoted by CBET in DreamerV3 may not align with the task objectives of the environment, leading to reduced returns and suboptimal policies.

Problem

Research questions and friction points this paper is trying to address.

Adapting CBET for world model algorithms like DreamerV3

Evaluating CBET's impact in sparse reward environments

Assessing CBET's alignment with task objectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts CBET for world model algorithms

Compares DreamerV3 and IMPALA with CBET

Evaluates CBET impact in sparse reward environments

🔎 Similar Papers

No similar papers found.