🤖 AI Summary
This study investigates the generalization capabilities of world models under environmental variations and their implications for real-world robotic deployment. Using visual quadrotor navigation as a testbed, the authors build upon the DreamerV3 framework, integrating self-supervised pretraining with reinforcement learning fine-tuning, and evaluate cross-domain performance under diverse environmental stochasticity. They find that generalization performance during self-supervised pretraining strongly predicts sim-to-real transfer success, and identify discrete latent variable dimensionality and training sequence length as critical factors influencing model quality. Experiments demonstrate that well-generalizing models enable a real quadrotor to perform 12-meter purely imagination-based navigation using only 2.5 seconds of visual input and successfully traverse gaps as narrow as 0.67 meters, whereas policies optimized in simulation but lacking generalization fail entirely in physical deployment.
📝 Abstract
World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood. To address this, we conduct a systematic study using vision-based quadrotor navigation as a testbed problem, training DreamerV3-based world models under varying levels of environmental randomness and evaluating them across all levels through cross-environment validation, spanning both Self-Supervised Learning (SSL) pretraining and Reinforcement Learning (RL) fine-tuning. We then deploy all world models and associated navigation policies on a real quadrotor in unseen environments, including an open-loop run where the model receives just 2.5s of real sensory input before all sensors are cut off, leaving the system to navigate entirely in imagination over a 12m traverse. Our results show that world model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. We further identify (a) the discrete latent size and (b) the training-sequence length as the dominant factors governing world model quality.