SceneGenAgent: Precise Industrial Scene Generation with Coding Agent

📅 2024-10-29
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to generate industrial manufacturing simulation scenes that satisfy precise dimensional and spatial constraints. Method: This paper proposes an encoder-based agent framework for industrial scene generation. It innovatively transforms LLMs into C# code-generation agents, integrating structured layout planning, automated constraint validation, and iterative refinement. We construct SceneInstruct—a domain-specific instruction-tuning dataset for industrial scenes—and perform lightweight fine-tuning on Llama3.1-70B. Contribution/Results: The approach enables computationally grounded, verifiable, and iterative scene modeling. On real-world industrial tasks, it achieves an 81.0% scene generation success rate—approaching the performance of GPT-4o—while significantly improving geometric and spatial fidelity. All code and data are publicly released.

Technology Category

Application Category

📝 Abstract
The modeling of industrial scenes is essential for simulations in industrial manufacturing. While large language models (LLMs) have shown significant progress in generating general 3D scenes from textual descriptions, generating industrial scenes with LLMs poses a unique challenge due to their demand for precise measurements and positioning, requiring complex planning over spatial arrangement. To address this challenge, we introduce SceneGenAgent, an LLM-based agent for generating industrial scenes through C# code. SceneGenAgent ensures precise layout planning through a structured and calculable format, layout verification, and iterative refinement to meet the quantitative requirements of industrial scenarios. Experiment results demonstrate that LLMs powered by SceneGenAgent exceed their original performance, reaching up to 81.0% success rate in real-world industrial scene generation tasks and effectively meeting most scene generation requirements. To further enhance accessibility, we construct SceneInstruct, a dataset designed for fine-tuning open-source LLMs to integrate into SceneGenAgent. Experiments show that fine-tuning open-source LLMs on SceneInstruct yields significant performance improvements, with Llama3.1-70B approaching the capabilities of GPT-4o. Our code and data are available at https://github.com/THUDM/SceneGenAgent .
Problem

Research questions and friction points this paper is trying to address.

Generating precise industrial scenes with LLMs
Ensuring accurate measurements and spatial arrangement
Improving accessibility via fine-tuning open-source LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agent generates industrial scenes via C#
Ensures precision with structured layout and verification
Fine-tunes open-source LLMs using SceneInstruct dataset
🔎 Similar Papers
No similar papers found.
X
Xiao Xia
Tsinghua University
D
Dan Zhang
Tsinghua University
Z
Zibo Liao
Foundational Technologies, Siemens Ltd., China
Zhenyu Hou
Zhenyu Hou
Tsinghua University
Language model reasoningGraph neural networks
T
Tian-Heng Sun
Foundational Technologies, Siemens Ltd., China
J
Jing Li
Foundational Technologies, Siemens Ltd., China
Ling Fu
Ling Fu
Master student of Computer Science, Huazhong university of science and technology
computer vision
Yuxiao Dong
Yuxiao Dong
CS, Tsinghua University
Large Language ModelsVision Language ModelsLLM ReasoningLLM AgentGraph Machine Learning