A Comprehensive Ecosystem for Open-Domain Customized Video Generation

๐Ÿ“… 2026-06-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of open-domain personalized video generation, which is hindered by the scarcity of large-scale, identity-annotated multimodal data. To this end, the authors introduce PexelsCustom-1M, a dataset comprising over one million identityโ€“textโ€“video triplets, and propose CustoMDiT, a parameter-efficient framework that adapts a pretrained multimodal diffusion Transformer into a personalized video generator with only 8% additional learnable parameters. By integrating cross-dataset knowledge and aligning video, text, and identity representations, the method achieves state-of-the-art performance on OpenCustom, a newly established benchmark encompassing more than 1,000 identity categories. The project fully releases the dataset, training pipeline, benchmark, and code to advance research in open-domain customized video generation.
๐Ÿ“ Abstract
Recent progress in video generation has shown impressive visual synthesis capabilities. However, open-domain customized video generation remains limited by the lack of large-scale, annotated datasets capturing diverse identity-specific attributes. To address this, we introduce PexelsCustom-1M, the first publicly available million-scale dataset for identity-preserving video generation, containing one million curated <identity, text, video> triplets across 8,000+ categories. Leveraging this, we propose CustoMDiT, a parameter-efficient framework that adapts a pretrained multimodal Diffusion Transformer into a customized video generator with only 8% additional learnable parameters. Our method surpasses prior state-of-the-art. However, benchmarks such as DreamBooth cover only 100 classes, which is insufficient for real-world applications. To overcome this, we construct OpenCustom, a new benchmark with 1,000+ categories, created via cross-dataset knowledge fusion from ImageNet and MS-COCO. Extensive experiments confirm the advantages of both our dataset and model. We will open-source the entire ecosystem--including dataset, pipeline, benchmark, and implementations--to support further research.
Problem

Research questions and friction points this paper is trying to address.

open-domain video generation
customized video generation
identity-preserving
large-scale dataset
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

customized video generation
PexelsCustom-1M
CustoMDiT
OpenCustom benchmark
parameter-efficient adaptation
๐Ÿ”Ž Similar Papers
No similar papers found.