Adapting a World Model for Trajectory Following in a 3D Game

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates trajectory imitation learning in the complex 3D game *Bleeding Edge*, addressing robustness challenges arising from environmental stochasticity and train-deploy distribution shift. To mitigate uncertainty bias in inverse dynamics models (IDMs), we propose a multi-scale future alignment strategy. We systematically evaluate combinations of visual encoders—DINOv2 versus from-scratch trained—and policy heads—GPT-style (autoregressive) versus MLP-style (feedforward). Results show that, with sufficient data, a self-trained encoder paired with a GPT-style head achieves optimal performance; under low-data regimes, DINOv2 with a GPT-style head is superior; and under pretraining-fine-tuning, both head types converge in performance. Additionally, we introduce a trajectory deviation quantification metric, enabling fine-grained robustness analysis for imitation learning. Our study provides principled insights into architecture selection and evaluation methodology for robust behavioral cloning in stochastic 3D environments.

Technology Category

Application Category

📝 Abstract
Imitation learning is a powerful tool for training agents by leveraging expert knowledge, and being able to replicate a given trajectory is an integral part of it. In complex environments, like modern 3D video games, distribution shift and stochasticity necessitate robust approaches beyond simple action replay. In this study, we apply Inverse Dynamics Models (IDM) with different encoders and policy heads to trajectory following in a modern 3D video game -- Bleeding Edge. Additionally, we investigate several future alignment strategies that address the distribution shift caused by the aleatoric uncertainty and imperfections of the agent. We measure both the trajectory deviation distance and the first significant deviation point between the reference and the agent's trajectory and show that the optimal configuration depends on the chosen setting. Our results show that in a diverse data setting, a GPT-style policy head with an encoder trained from scratch performs the best, DINOv2 encoder with the GPT-style policy head gives the best results in the low data regime, and both GPT-style and MLP-style policy heads had comparable results when pre-trained on a diverse setting and fine-tuned for a specific behaviour setting.
Problem

Research questions and friction points this paper is trying to address.

Adapting World Model for 3D game trajectory following
Addressing distribution shift in imitation learning
Evaluating encoder and policy head performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inverse Dynamics Models for trajectory following
GPT-style policy head with diverse data
DINOv2 encoder for low data regime
🔎 Similar Papers
No similar papers found.
M
Marko Tot
Microsoft Research, Queen Mary University of London
Shu Ishida
Shu Ishida
Senior Research Scientist @ Autodesk Research
Deep LearningReinforcement LearningMachine LearningComputer VisionLarge Language Models
A
Abdelhak Lemkhenter
Microsoft Research
David Bignell
David Bignell
Research Scientist, Microsoft Research
Reinforcement LearningDeep LearningMachine Learning
P
Pallavi Choudhury
Microsoft Research
C
Chris Lovett
Microsoft Research
L
Luis Francca
Microsoft Research
M
Matheus Ribeiro Furtado de Mendoncca
Microsoft Research
Tarun Gupta
Tarun Gupta
University of Oxford, Google Waymo Research
Reinforcement LearningMachine Learning
D
Darren Gehring
Microsoft Research
Sam Devlin
Sam Devlin
Meta
Game AIImitation LearningMulti-Agent SystemsReinforcement LearningWorld Models
S
Sergio Valcarcel Macua
Microsoft Research
R
Raluca Georgescu
Microsoft Research