Programmable-Room: Interactive Textured 3D Room Meshes Generation Empowered by Large Language Models

📅 2025-06-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of text-driven interactive 3D room mesh generation and editing under natural language instructions, specifically targeting fine-grained attribute control and high-fidelity texture modeling. Methodologically, it introduces a visual programming framework wherein a large language model parses textual commands into executable module sequences—coordinating coordinate generation, panoramic image rendering, mesh reconstruction, and furniture layout. A bidirectional LSTM is innovatively integrated to optimize 1D panoramic representations, enhancing conditional controllability and texture-mapping accuracy. Furthermore, a diffusion-based generative model coupled with semantic guidance ensures geometric–textural–semantic consistency. Qualitative and quantitative evaluations demonstrate that the proposed method surpasses state-of-the-art approaches in generation quality, editing flexibility, and semantic fidelity, enabling high-fidelity, editable, and texture-complete 3D room modeling.

Technology Category

Application Category

📝 Abstract
We present Programmable-Room, a framework which interactively generates and edits a 3D room mesh, given natural language instructions. For precise control of a room's each attribute, we decompose the challenging task into simpler steps such as creating plausible 3D coordinates for room meshes, generating panorama images for the texture, constructing 3D meshes by integrating the coordinates and panorama texture images, and arranging furniture. To support the various decomposed tasks with a unified framework, we incorporate visual programming (VP). VP is a method that utilizes a large language model (LLM) to write a Python-like program which is an ordered list of necessary modules for the various tasks given in natural language. We develop most of the modules. Especially, for the texture generating module, we utilize a pretrained large-scale diffusion model to generate panorama images conditioned on text and visual prompts (i.e., layout, depth, and semantic map) simultaneously. Specifically, we enhance the panorama image generation quality by optimizing the training objective with a 1D representation of a panorama scene obtained from bidirectional LSTM. We demonstrate Programmable-Room's flexibility in generating and editing 3D room meshes, and prove our framework's superiority to an existing model quantitatively and qualitatively. Project page is available in https://jihyun0510.github.io/Programmable_Room_Page/.
Problem

Research questions and friction points this paper is trying to address.

Generates 3D room meshes from natural language instructions
Decomposes complex tasks into simpler steps for precise control
Enhances panorama image quality using optimized diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive 3D room mesh generation via LLM
Visual programming for unified task decomposition
Diffusion model enhanced panorama texture generation
🔎 Similar Papers
No similar papers found.
J
Jihyun Kim
Department of Electronic Engineering, Sogang University, Seoul, South Korea
J
Junho Park
Department of Electronic Engineering, Sogang University, Seoul, South Korea
K
Kyeongbo Kong
Department of Electrical and Electronics Engineering, Pusan National University, Pusan, South Korea
Suk-Ju Kang
Suk-Ju Kang
Sogang University
Image processingvideo processingmultimedia signal processingcircuit design for display and multimedia systemsdeep learni