Programmable-Room: Interactive Textured 3D Room Meshes Generation Empowered by Large Language Models

📅 2025-06-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of text-driven interactive 3D room mesh generation and editing under natural language instructions, specifically targeting fine-grained attribute control and high-fidelity texture modeling. Methodologically, it introduces a visual programming framework wherein a large language model parses textual commands into executable module sequences—coordinating coordinate generation, panoramic image rendering, mesh reconstruction, and furniture layout. A bidirectional LSTM is innovatively integrated to optimize 1D panoramic representations, enhancing conditional controllability and texture-mapping accuracy. Furthermore, a diffusion-based generative model coupled with semantic guidance ensures geometric–textural–semantic consistency. Qualitative and quantitative evaluations demonstrate that the proposed method surpasses state-of-the-art approaches in generation quality, editing flexibility, and semantic fidelity, enabling high-fidelity, editable, and texture-complete 3D room modeling.

Technology Category

Application Category

📝 Abstract

We present Programmable-Room, a framework which interactively generates and edits a 3D room mesh, given natural language instructions. For precise control of a room's each attribute, we decompose the challenging task into simpler steps such as creating plausible 3D coordinates for room meshes, generating panorama images for the texture, constructing 3D meshes by integrating the coordinates and panorama texture images, and arranging furniture. To support the various decomposed tasks with a unified framework, we incorporate visual programming (VP). VP is a method that utilizes a large language model (LLM) to write a Python-like program which is an ordered list of necessary modules for the various tasks given in natural language. We develop most of the modules. Especially, for the texture generating module, we utilize a pretrained large-scale diffusion model to generate panorama images conditioned on text and visual prompts (i.e., layout, depth, and semantic map) simultaneously. Specifically, we enhance the panorama image generation quality by optimizing the training objective with a 1D representation of a panorama scene obtained from bidirectional LSTM. We demonstrate Programmable-Room's flexibility in generating and editing 3D room meshes, and prove our framework's superiority to an existing model quantitatively and qualitatively. Project page is available in https://jihyun0510.github.io/Programmable_Room_Page/.

Problem

Research questions and friction points this paper is trying to address.

Generates 3D room meshes from natural language instructions

Decomposes complex tasks into simpler steps for precise control

Enhances panorama image quality using optimized diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive 3D room mesh generation via LLM

Visual programming for unified task decomposition

Diffusion model enhanced panorama texture generation

🔎 Similar Papers

No similar papers found.

Authors to Follow