Understanding World or Predicting Future? A Comprehensive Survey of World Models

📅 2024-11-21
🏛️ arXiv.org
📈 Citations: 16
Influential: 1
📄 PDF
🤖 AI Summary
A fundamental ambiguity persists in world modeling research regarding whether its core objective is *understanding* underlying world mechanisms or *predicting* future dynamics. Method: This paper introduces the first “Understanding-oriented vs. Prediction-oriented” dichotomy framework to rigorously delineate their theoretical boundaries and synergistic relationships. We conduct cross-domain analysis—spanning autonomous driving, robotics, and social simulation—to characterize divergent application paradigms. Integrating multimodal foundation models (e.g., GPT-4), video generation (e.g., Sora), causal inference, neuro-symbolic modeling, and reinforcement learning, we construct a unified four-dimensional taxonomy covering representation, prediction, intervention, and evaluation. Contribution/Results: We establish the first comprehensive survey framework for world models, explicitly identifying key challenges—including interpretability, generalization, and causal validity—and propose a three-tier evolutionary roadmap toward AGI. This work provides both a theoretical benchmark and practical guidance for advancing world modeling research.

Technology Category

Application Category

📝 Abstract
The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions.
Problem

Research questions and friction points this paper is trying to address.

Survey categorizes world models for understanding or predicting dynamics
Explores applications in autonomous driving, robotics, and social simulacra
Identifies challenges and future research directions for world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal large language models for world understanding
Video generation models for future prediction
Systematic categorization of world models functions
🔎 Similar Papers
No similar papers found.