🤖 AI Summary
Existing indoor scene generation methods struggle to simultaneously achieve global coherence, photorealism, and simulation readiness, often hindered by sparse 3D data and fragmented subtasks. This work proposes a unified hierarchical generation framework that, for the first time, enables end-to-end synthesis of fully furnished indoor scenes from large-scale real floorplans. The approach leverages a large language model for high-level layout control, combined with a K-D tree-based floorplan representation, multi-view image generation, iterative refinement via vision-language models, and 3D asset replacement to progressively construct interactive scenes populated with furniture and manipulable small objects. The method supports multi-granularity user control, physical property assignment, and simulation-ready outputs, significantly outperforming prior art in layout diversity and 3D visual fidelity. The authors release 300,000 floorplans and 5,000 complete furnished scenes to support future research.
📝 Abstract
Indoor scene generation is crucial for robot simulation and modern interior design. However, complex layouts together with scarce 3D scene data make learning-based generation challenging. Existing methods often rely on hand-crafted rules or focus on isolated sub-tasks (e.g., floorplan synthesis or single-room furnishing), producing whole-home scenes that lack global coherence, realism, and simulation readiness. To mitigate these limitations, we propose a unified hierarchical framework that decomposes indoor scene synthesis into controllable stages. First, we curate a large-scale dataset of 300K real residential floorplans to train a large language model for whole-home floorplan generation. With detailed descriptions and a K-D tree-based representation, our method enables fine-grained, controllable whole-home floorplan generation. Building upon the generated whole-home floorplan, we leverage image generation models to draft furniture layouts from multi-level roaming viewpoints, and then generate the layouts of small manipulable objects on different supporting surfaces (e.g., cabinets, desks, and dining tables) for embodied AI simulation. During furniture and object layout generation, a VLM-based refiner iteratively corrects furniture and object placement, and a 3D generative model enables flexible replacement of individual assets. We further attach basic physical attributes and simple surface texture and lighting setups to complete the pipeline for embodied AI use. Experiments and user studies demonstrate that our pipeline produces indoor spaces with greater layout diversity and stronger 3D design appeal, outperforming prior methods on both quantitative and qualitative metrics. Finally, alongside our generation pipeline, we will release the floorplan dataset and 5K fully furnished scenes to the community. Project Page: https://kairos-homeworld.github.io/