3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Despite rapid advances in 3D generation, there is a critical lack of human-perception-aligned automatic evaluation metrics and large-scale, multidimensional human preference datasets for benchmarking. Method: We introduce 3DGen-Bench—the first open-source, large-scale human preference dataset for 3D generative models—featuring diverse text- and image-to-3D prompts, annotated jointly by domain experts and general users. To enable unified, quantitative assessment, we propose a dual-engine automatic evaluation framework: (i) 3DGen-Score, a CLIP-finetuned metric for perceptual quality; and (ii) 3DGen-Eval, an MLLM-based evaluator supporting cross-modal (text/image) input. Contribution/Results: Experiments demonstrate that 3DGen-Score achieves significantly higher correlation with human rankings than existing metrics, establishing a foundation for fair, standardized, and perception-grounded evaluation of 3D generative models.

Technology Category

Application Category

📝 Abstract
3D generation is experiencing rapid advancements, while the development of 3D evaluation has not kept pace. How to keep automatic evaluation equitably aligned with human perception has become a well-recognized challenge. Recent advances in the field of language and image generation have explored human preferences and showcased respectable fitting ability. However, the 3D domain still lacks such a comprehensive preference dataset over generative models. To mitigate this absence, we develop 3DGen-Arena, an integrated platform in a battle manner. Then, we carefully design diverse text and image prompts and leverage the arena platform to gather human preferences from both public users and expert annotators, resulting in a large-scale multi-dimension human preference dataset 3DGen-Bench. Using this dataset, we further train a CLIP-based scoring model, 3DGen-Score, and a MLLM-based automatic evaluator, 3DGen-Eval. These two models innovatively unify the quality evaluation of text-to-3D and image-to-3D generation, and jointly form our automated evaluation system with their respective strengths. Extensive experiments demonstrate the efficacy of our scoring model in predicting human preferences, exhibiting a superior correlation with human ranks compared to existing metrics. We believe that our 3DGen-Bench dataset and automated evaluation system will foster a more equitable evaluation in the field of 3D generation, further promoting the development of 3D generative models and their downstream applications.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive 3D generative model evaluation benchmarks
Need for automatic evaluation aligned with human perception
Absence of unified quality metrics for text-to-3D and image-to-3D
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed 3DGen-Arena for human preference collection
Created CLIP-based 3DGen-Score for quality evaluation
Built MLLM-based 3DGen-Eval for automatic assessment
🔎 Similar Papers
No similar papers found.