World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

224K/year
🤖 AI Summary
This work addresses the lack of a unified framework in world model research, which has hindered systematic integration across diverse architectures, methodologies, reasoning paradigms, and applications. To bridge this gap, the paper proposes a four-dimensional taxonomy encompassing architectural designs (e.g., state-space models, Transformers, diffusion models, physics-informed networks), methodological families (e.g., language-augmented multimodal systems), reasoning paradigms (e.g., imagination-based planning and counterfactual reasoning), and application domains. For the first time, this framework connects foundational insights from cognitive science to the latest advances in large-scale models, revealing an emerging trend toward integrating chain-of-thought reasoning with world model imagination. It clarifies the overall developmental landscape, identifies core challenges such as error accumulation in prediction and simulation-to-reality transfer, and outlines a pathway toward unified multimodal world models.
📝 Abstract
World models, internal simulators that learn the structure and dynamics of an environment, have emerged as a central paradigm in the pursuit of artificial general intelligence, enabling agents to predict, plan, and reason within learned representations. Despite rapid progress across reinforcement learning, robotics, autonomous driving, and video generation, the field lacks a unified framework integrating its diverse architectural choices, training methods, reasoning mechanisms, and application settings. This survey addresses that gap with a multi-axis taxonomy organized along four dimensions: (i) architecture, encompassing representation format, dynamics formulation, input modality, learning paradigm, and downstream application; (ii) methodological family, including state-space and recurrent approaches, transformer-based models, diffusion-based generators, physics-informed networks, and language-augmented multimodal systems; (iii) reasoning strategy, covering imagination-based planning, latent policy learning, counterfactual reasoning, and planning under uncertainty; and (iv) application domain, spanning robotics, autonomous driving, video prediction, multimodal agents, reinforcement learning, scientific modeling, medical imaging, educational measurement, and business and finance. Tracing the field from early cognitive-science foundations to milestone systems such as PlaNet, the Dreamer family, MuZero, Sora, Cosmos, and Genie, we examine how these dimensions interact and highlight the recent convergence of chain-of-thought reasoning with world-model imagination. We review evaluation protocols and benchmarks, identify persistent challenges such as compounding prediction errors, sim-to-real transfer, and fragmented evaluation, and outline future directions toward unified multimodal world models, foundation-scale interactive simulators, and safe deployment in safety-critical domains.
Problem

Research questions and friction points this paper is trying to address.

world models
unified framework
architectural diversity
reasoning paradigms
application fragmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

world models
multi-axis taxonomy
reasoning paradigms
multimodal integration
foundation-scale simulators
🔎 Similar Papers
No similar papers found.
Arif Hassan Zidan
Arif Hassan Zidan
Graduate Research Assistant
Brain ImagingComputational NeuroscienceArtificial IntelligenceMachine Learning
Yi Pan
Yi Pan
University of Georgia
Brain-inspired AIArtificial General Intelligence
Hanqi Jiang
Hanqi Jiang
University of Georgia
Medical Image AnalysisMulti-modal Large Language Models
R
Ruiyu Yan
Tandon School of Engineering, New York University, Brooklyn, NY, USA
Wei Ruan
Wei Ruan
University of Georgia
Zihao Wu
Zihao Wu
University of Georgia
Brain-inspired AIArtificial General IntelligenceNLPMedical Image Analysis
L
Lifeng Chen
School of Computing, University of Georgia, Athens, GA, USA
W
Weihang You
School of Computing, University of Georgia, Athens, GA, USA
X
Xinliang Li
School of Computing, University of Georgia, Athens, GA, USA
B
Bowen Chen
Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
Huawen Hu
Huawen Hu
Northwestern Polytechnical University
Reinforcement LeariningRoboticsBrain Computer Interface
Peilong Wang
Peilong Wang
City of Hope
PhysicsAIImaging
S
Sizhuang Liu
Savannah River Ecology Laboratory (SREL), University of Georgia, Aiken, SC, USA
Jing Zhang
Jing Zhang
University of Texas at Arlington
Large Language ModelsVision-language modelsBrain inspired AIMedical imaging analysis
Siyuan Li
Siyuan Li
College of William and Mary
PersonalizationRecommendation SystemsSocial NetworkGreen IS/IT
Zhengliang Liu
Zhengliang Liu
University of Georgia
Natural Language ProcessingMedical NLPMedical Image AnalysisData Visualization
Yu Bao
Yu Bao
Assistant Professor, James Madison University
Educational measurementPsychometricsHigher Education Assessment
Lin Zhao
Lin Zhao
New Jersey Institute of Technology
Brain-inspired AIMedical Image AnalysisArtificial General Intelligence
L
Lichao Sun
Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA
Dajiang Zhu
Dajiang Zhu
University of Texas at Arlington
Computer ScienceComputational NeuroscienceMedical Imaging
Xiang Li
Xiang Li
Assistant Professor, Massachusetts General Hospital and Harvard Medical School
Medical Foundation ModelMedical InformaticsMulti-modal FusionCausal InferenceBrain
Jinglei Lv
Jinglei Lv
University of Sydney
Biomedical ImagingMedical Image AnalysisNeuroimagingEEGNeuroscience
Quanzheng Li
Quanzheng Li
Massachusetts General Hospital, Harvard Medical School
Image ReconstructionMedical Image AnalysisDeep Learning in MedicineMultimodality Medical Data Analysis
W
Wei Liu
Department of Mayo Clinic Comprehensive Cancer Center, Mayo Clinics, Phoenix, AZ, USA
Tianming Liu
Tianming Liu
Distinguished Research Professor of Computer Science, University of Georgia
BrainBrain-Inspired AILLMArtificial General IntelligenceQuantum AI