M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D indoor layout datasets suffer from limited scale, low diversity, and coarse annotations, hindering the development of text-to-3D layout generation models. To address these limitations, we introduce MultiSourceLayout—the first multi-source 3D layout dataset integrating real-world scans, professional CAD designs, and procedural generation. It comprises 15,080 high-fidelity layouts and over 258,000 object instances, each annotated with fine-grained, structured textual descriptions. We propose a cross-source data registration and semantic alignment framework to ensure consistency across heterogeneous sources, and establish a text-conditioned diffusion-based generation benchmark. Experiments demonstrate that models trained on MultiSourceLayout achieve substantial improvements in geometric complexity, semantic controllability, and detail fidelity of generated scenes. This dataset establishes a scalable, high-fidelity benchmark for text-to-3D layout generation, enabling more robust and controllable synthesis.

Technology Category

Application Category

📝 Abstract
In text-driven 3D scene generation, object layout serves as a crucial intermediate representation that bridges high-level language instructions with detailed geometric output. It not only provides a structural blueprint for ensuring physical plausibility but also supports semantic controllability and interactive editing. However, the learning capabilities of current 3D indoor layout generation models are constrained by the limited scale, diversity, and annotation quality of existing datasets. To address this, we introduce M3DLayout, a large-scale, multi-source dataset for 3D indoor layout generation. M3DLayout comprises 15,080 layouts and over 258k object instances, integrating three distinct sources: real-world scans, professional CAD designs, and procedurally generated scenes. Each layout is paired with detailed structured text describing global scene summaries, relational placements of large furniture, and fine-grained arrangements of smaller items. This diverse and richly annotated resource enables models to learn complex spatial and semantic patterns across a wide variety of indoor environments. To assess the potential of M3DLayout, we establish a benchmark using a text-conditioned diffusion model. Experimental results demonstrate that our dataset provides a solid foundation for training layout generation models. Its multi-source composition enhances diversity, notably through the Inf3DLayout subset which provides rich small-object information, enabling the generation of more complex and detailed scenes. We hope that M3DLayout can serve as a valuable resource for advancing research in text-driven 3D scene synthesis.
Problem

Research questions and friction points this paper is trying to address.

Limited dataset scale and diversity constrain 3D layout generation models
Existing datasets lack quality annotations for complex spatial and semantic patterns
M3DLayout addresses these gaps with multi-source, richly annotated data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-source dataset combining scans, CAD designs, generated scenes
Structured text annotations for global and fine-grained descriptions
Text-conditioned diffusion model benchmark for layout generation
🔎 Similar Papers
No similar papers found.
Y
Yiheng Zhang
Tsinghua University
Zhuojiang Cai
Zhuojiang Cai
Technical University of Munich
Human-Computer InteractionComputer Vision
M
Mingdao Wang
Tsinghua University
M
Meitong Guo
Tsinghua University
T
Tianxiao Li
Tsinghua University
L
Li Lin
Migu Beijing Research Institute
Yuwang Wang
Yuwang Wang
Tsinghua University