Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing world model evaluation benchmarks predominantly emphasize visual fidelity or static 3D reconstruction, lacking systematic assessment of spatiotemporal interactive responsiveness. This work proposes the first comprehensive evaluation framework tailored to the 4D generation paradigm, introducing Omni-WorldSuite—a prompt suite encompassing multi-level interactions and diverse scene types—and Omni-Metrics, an agent-based causal influence measurement system that analyzes spatiotemporal state trajectories. Through large-scale evaluation of 18 leading models, the study reveals significant deficiencies in current approaches' ability to model interactive responses. The released Omni-WorldBench aims to catalyze progress in interactive world modeling by providing a standardized, holistic benchmark for future research.

Technology Category

Application Category

📝 Abstract
Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentally neglect temporal dynamics. We argue that the future of world modeling lies in 4D generation, which jointly models spatial structure and temporal evolution. In this paradigm, the core capability is interactive response: the ability to faithfully reflect how interaction actions drive state transitions across space and time. Yet no existing benchmark systematically evaluates this critical dimension. To address this gap, we propose Omni--WorldBench, a comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models in 4D settings. Omni--WorldBench comprises two key components: Omni--WorldSuite, a systematic prompt suite spanning diverse interaction levels and scene types; and Omni--Metrics, an agent-based evaluation framework that quantifies world modeling capabilities by measuring the causal impact of interaction actions on both final outcomes and intermediate state evolution trajectories. We conduct extensive evaluations of 18 representative world models across multiple paradigms. Our analysis reveals critical limitations of current world models in interactive response, providing actionable insights for future research. Omni-WorldBench will be publicly released to foster progress in interactive 4D world modeling.
Problem

Research questions and friction points this paper is trying to address.

world models
interactive response
4D generation
evaluation benchmark
temporal dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

interactive response
4D world modeling
agent-based evaluation
causal impact
Omni-WorldBench
🔎 Similar Papers
No similar papers found.
Meiqi Wu
Meiqi Wu
the University of Chinese Academy of Sciences
Computer vision
Z
Zhixin Cai
School of Computer Science and Engineering, Beihang University
F
Fufangchen Zhao
State Key Laboratory of Networking and Switching Technology, BUPT
Xiaokun Feng
Xiaokun Feng
Institute of Automation,Chinese Academy of Sciences
computer versiondeep learning
R
Rujing Dang
AMAP, Alibaba Group
B
Bingze Song
AMAP, Alibaba Group
R
Ruitian Tian
AMAP, Alibaba Group
J
Jiashu Zhu
AMAP, Alibaba Group
J
Jiachen Lei
AMAP, Alibaba Group
Hao Dou
Hao Dou
Institute of Automation, Chinese Academy of Sciences
Machine LearningImage Processing
J
Jing Tang
AMAP, Alibaba Group
L
Lei Sun
AMAP, Alibaba Group
Jiahong Wu
Jiahong Wu
Alibaba-AMAP
AIMLAIGCMLLM
X
Xiangxiang Chu
AMAP, Alibaba Group
Z
Zeming Liu
School of Computer Science and Engineering, Beihang University
K
Kaiqi Huang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, CASIA