BIMgent: Towards Autonomous Building Modeling via Computer-use Agents

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing research primarily targets generic desktop automation, lacking intelligent agents specialized for highly domain-specific GUI tasks—such as 3D modeling—in architecture, engineering, and construction (AEC). Method: This paper introduces, for the first time, computer-use agents into Building Information Modeling (BIM) modeling scenarios, proposing the first multimodal large language model (MLLM)-based GUI operation agent framework. It enables end-to-end autonomous modeling—from conceptual input and workflow planning to executable interface actions in BIM software (e.g., Autodesk Revit)—via a novel multi-stage task decomposition mechanism and a vision-language collaborative decision-making paradigm integrating screen understanding, action generation, and software automation. Contribution/Results: Evaluated on real-world BIM modeling tasks, the framework achieves a 32% operation success rate (baseline: 0%), produces designs of reasonable quality, significantly reduces manual effort, and preserves user-specified design intent.

Technology Category

Application Category

📝 Abstract

Existing computer-use agents primarily focus on general-purpose desktop automation tasks, with limited exploration of their application in highly specialized domains. In particular, the 3D building modeling process in the Architecture, Engineering, and Construction (AEC) sector involves open-ended design tasks and complex interaction patterns within Building Information Modeling (BIM) authoring software, which has yet to be thoroughly addressed by current studies. In this paper, we propose BIMgent, an agentic framework powered by multimodal large language models (LLMs), designed to enable autonomous building model authoring via graphical user interface (GUI) operations. BIMgent automates the architectural building modeling process, including multimodal input for conceptual design, planning of software-specific workflows, and efficient execution of the authoring GUI actions. We evaluate BIMgent on real-world building modeling tasks, including both text-based conceptual design generation and reconstruction from existing building design. The design quality achieved by BIMgent was found to be reasonable. Its operations achieved a 32% success rate, whereas all baseline models failed to complete the tasks (0% success rate). Results demonstrate that BIMgent effectively reduces manual workload while preserving design intent, highlighting its potential for practical deployment in real-world architectural modeling scenarios.

Problem

Research questions and friction points this paper is trying to address.

Autonomous 3D building modeling in AEC sector

Complex BIM software interaction automation

Multimodal LLM-based framework for design tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLMs for autonomous building modeling

GUI-based workflow planning and execution

Automated architectural design and reconstruction

🔎 Similar Papers

Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends