🤖 AI Summary
This work addresses a critical limitation in existing text-driven 3D indoor scene generation methods, which predominantly focus on object placement while neglecting how spatial layouts support user activities and needs. To bridge this gap, the authors propose a novel layout generation approach grounded in natural language functional specifications—such as user identities and intended activities—shifting the objective from merely arranging objects plausibly to actively supporting human use. The method integrates large language models, vision-language models, geometric reasoning, and 17 custom functional design principles within an iterative “check-and-repair” optimization framework. Evaluated on 30 expert-designed cases, 94.3% of the generated layouts were preferred over state-of-the-art LLM-based baselines in pairwise comparisons, demonstrating a significant improvement in functional adequacy.
📝 Abstract
Most text-driven 3D indoor scene synthesis methods generate rooms from object-centric prompts, asking what furniture should be placed rather than how the space is used. Yet in real interior design, a layout is judged by how well it supports its occupants, e.g., their activities and physical needs. We introduce Function2Scene, a framework for generating 3D indoor layouts from functional specifications, i.e., natural-language design briefs describing who will use a room and what they need to do there. Given such a specification, our system parses occupant personas and activities, derives a customized set of functional design constraints from a taxonomy of 17 criteria spanning spatial, ergonomic, activity, and environmental considerations, and uses these constraints to guide layout generation. Rather than relying on an LLM to directly produce a final scene, Function2Scene performs iterative evaluation and refinement through a tool-augmented check-and-repair loop, combining geometric measurements, LLM-based contextual reasoning, and VLM-based visual assessment. Experiments on 30 professionally written interior-design cases show that Function2Scene produces layouts that better satisfy functional requirements than recent LLM-based scene synthesis baselines, with our results preferred in 94.3% of pairwise comparisons. Our work reframes text-driven indoor scene synthesis from placing plausible objects to designing spaces that support human use.