Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prohibitively high training costs and commercial barriers of video generation models, this paper introduces Open-Sora 2.0—the first open-source video generation model achieving production-level performance within a $200K budget. Methodologically, it integrates fine-grained data curation, lightweight spatiotemporal attention, progressive training scheduling, and distributed system optimizations to substantially improve training efficiency and hardware utilization. Experimental results demonstrate that Open-Sora 2.0 matches HunyuanVideo and Runway Gen-3 Alpha in human evaluation and VBench benchmarks. Crucially, all model architectures, training code, and checkpoint weights are fully open-sourced, ensuring complete reproducibility. This work establishes, for the first time, the feasibility of a low-cost, high-fidelity, fully open video generation paradigm—significantly lowering the barrier to practical deployment and advancing the democratization of AI-powered video synthesis.

Technology Category

Application Category

📝 Abstract
Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.
Problem

Research questions and friction points this paper is trying to address.

Reducing training cost for commercial-level video generation models.
Achieving high-quality video generation with efficient resource utilization.
Democratizing access to advanced video generation technology.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cost-effective training at $200k
Efficient data curation and model architecture
Open-source commercial-level video generation
🔎 Similar Papers
No similar papers found.
X
Xiangyu Peng
Zangwei Zheng
Zangwei Zheng
Ph.D. of National University of Singapore
Machine LearningHigh Performance ComputingComputer Vision
C
Chenhui Shen
Tom Young
Tom Young
X
Xinying Guo
B
Binluo Wang
H
Hang Xu
Hongxin Liu
Hongxin Liu
M
Mingyan Jiang
W
Wenjun Li
Y
Yuhui Wang
Anbang Ye
Anbang Ye
HPC-AI Tech
Natural Language ProcessingMachine Learning
Gang Ren
Gang Ren
Q
Qianran Ma
W
Wanying Liang
X
Xiang Lian
X
Xiwen Wu
Y
Yuting Zhong
Z
Zhuangyan Li
Chaoyu Gong
Chaoyu Gong
NTU
G
Guojun Lei
L
Leijun Cheng
Limin Zhang
Limin Zhang
Minghao Li
Minghao Li
Beihang University
Natural Language Processing
R
Ruijie Zhang
S
Silan Hu
S
Shijie Huang
X
Xiaokang Wang
Y
Yuanheng Zhao
Y
Yuqi Wang
Ziang Wei
Ziang Wei
Yang You
Yang You
Postdoc, Stanford University
3D visioncomputer graphicscomputational geometry