Better World Models Can Lead to Better Post-Training Performance

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work investigates how explicit world modeling objectives influence internal representations of Transformers and downstream reinforcement learning performance. We systematically compare state prediction pretraining versus joint state prediction and language modeling pretraining in a 2×2×2 Rubik’s Cube environment, followed by post-training with Group Relative Policy Optimization (GRPO). Using linear probing and causal intervention analysis, we find that explicit world modeling markedly improves the linear separability and causal controllability of state representations. Moreover, world model fidelity strongly correlates with post-trained policy performance—especially on high-complexity states—yielding substantial gains. Crucially, this study establishes, for the first time, an intrinsic link between world model representation quality and policy optimization efficiency. Our findings provide both theoretical grounding and empirical evidence for developing embodied AI models with interpretable and causally manipulable state representations.

Technology Category

Application Category

📝 Abstract

In this work we study how explicit world-modeling objectives affect the internal representations and downstream capability of Transformers across different training stages. We use a controlled 2x2x2 Rubik's Cube and ask: (1) how does explicitly pretraining a world model affect the model's latent representations, and (2) how does world-model quality affect the model's performance after reinforcement learning post-training? We compare standard next-token prediction to two explicit world-modeling strategies -- (i) state-prediction pretraining and (ii) a joint state-prediction + next-token objective -- and assess task performance after Group Relative Policy Optimization (GRPO) is applied as post-training. We evaluate the representation quality with linear probes and causal interventions. We find that explicit world-modeling yields more linearly decodable and causally steerable state representations. More importantly, we find that improved state representations lead to higher gains for GRPO, especially on harder cube states. Our results indicate that sharpening state representations can improve the effectiveness of post-training for sequence-planning tasks.

Problem

Research questions and friction points this paper is trying to address.

Examine how world-modeling objectives affect Transformer representations and downstream performance

Assess impact of world-model quality on post-training reinforcement learning outcomes

Investigate if sharper state representations improve sequence-planning task effectiveness after post-training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit world-modeling objectives improve Transformer representations

State-prediction pretraining enhances linear decodability and causal steerability

Better state representations boost reinforcement learning post-training performance

🔎 Similar Papers

No similar papers found.

Microsoft

$119,800 -

San Francisco Bay area / New York City metropolitan area

Member of Technical Staff - Post-Training and RL

xAI

$180,000 - $600,000 USD

Palo Alto, CA / Palo Alto, CA, Palo Alto, California, United States

Research Engineer / Research Scientist, Post-Training

OpenAI

$295K – $555K • Offers Equity

San Francisco, CA, USA

Researcher, Alignment Training

OpenAI

$250K – $445K • Offers Equity

San Francisco

Research Scientist, RL for Autonomous Planning & World Modeling

Waymo

$204,000—$259,000 USD

Remote jobs only - Please note that Waymo may not be able to employ remotely in all locations. Please speak with your recruiter about your preferred location for remote work when you begin the interview process.

Authors to Follow