🤖 AI Summary
This work investigates how explicit world modeling objectives influence internal representations of Transformers and downstream reinforcement learning performance. We systematically compare state prediction pretraining versus joint state prediction and language modeling pretraining in a 2×2×2 Rubik’s Cube environment, followed by post-training with Group Relative Policy Optimization (GRPO). Using linear probing and causal intervention analysis, we find that explicit world modeling markedly improves the linear separability and causal controllability of state representations. Moreover, world model fidelity strongly correlates with post-trained policy performance—especially on high-complexity states—yielding substantial gains. Crucially, this study establishes, for the first time, an intrinsic link between world model representation quality and policy optimization efficiency. Our findings provide both theoretical grounding and empirical evidence for developing embodied AI models with interpretable and causally manipulable state representations.
📝 Abstract
In this work we study how explicit world-modeling objectives affect the internal representations and downstream capability of Transformers across different training stages. We use a controlled 2x2x2 Rubik's Cube and ask: (1) how does explicitly pretraining a world model affect the model's latent representations, and (2) how does world-model quality affect the model's performance after reinforcement learning post-training? We compare standard next-token prediction to two explicit world-modeling strategies -- (i) state-prediction pretraining and (ii) a joint state-prediction + next-token objective -- and assess task performance after Group Relative Policy Optimization (GRPO) is applied as post-training. We evaluate the representation quality with linear probes and causal interventions. We find that explicit world-modeling yields more linearly decodable and causally steerable state representations. More importantly, we find that improved state representations lead to higher gains for GRPO, especially on harder cube states. Our results indicate that sharpening state representations can improve the effectiveness of post-training for sequence-planning tasks.