Code2Worlds: Empowering Coding LLMs for 4D World Generation

πŸ“… 2026-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge that current large language models struggle to generate physically plausible and dynamically consistent 4D world simulations. The authors formulate 4D generation as a task of translating natural language into executable simulation code and propose a dual-stream architecture that decouples object generation from multi-scale environmental orchestration. They introduce, for the first time, a vision-language model–based motion evaluator (VLM-Motion Critic) to drive a self-reflective closed-loop optimization mechanism. By integrating retrieval-augmented generation, hierarchical orchestration, and physics-aware simulation code synthesis, the method achieves significant improvements on the Code4D benchmark, increasing Scene Geometry Score (SGS) by 41% and Richness by 49%, thereby enabling the first generation of 4D dynamic scenes with physical consistency.

Technology Category

Application Category

πŸ“ Abstract
Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.
Problem

Research questions and friction points this paper is trying to address.

4D world generation
spatial intelligence
physical fidelity
dynamic simulation
coding LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D world generation
coding LLMs
physics-aware simulation
closed-loop refinement
dual-stream architecture
πŸ”Ž Similar Papers
No similar papers found.