A Zero-Shot Multi-Agent Framework for Human-Building Interaction via Programmatic Reasoning

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses key challenges in applying large language models to the building domain—namely, scarce training data, difficulty in integrating domain-specific knowledge, and the tension between natural interaction and technical accuracy—by proposing a zero-shot hierarchical multi-agent framework. The framework decouples natural language understanding from building data analysis through semantic routing and procedural reasoning mechanisms, and employs a “gatekeeper” strategy to decompose user queries into subtasks. Dedicated encoding agents then generate executable Python scripts, circumventing limitations inherent in conventional retrieval-augmented generation (RAG) approaches. Evaluated on a dataset comprising over 200 commercial buildings, the method delivers accurate, context-aware responses tailored to diverse users—from tenants to facility managers—without requiring any model fine-tuning.

📝 Abstract

Large Language Model (LLM) offers opportunities to enhance Human-Building Interaction (HBI) by enabling more direct interactions through intuitive interfaces to complex building systems. These systems can be characterized by the vast amounts of data across multiple formats, the lack of nonconfidential and generalizable information, and the requirement of domain expertise for interpretation. Applying LLMs to domain-specific tasks like HBI presents additional challenges. Limited training data makes traditional fine-tuning approaches impractical. Meanwhile, the opacity of LLM training data requires careful integration of domain knowledge to ensure reliability. Additionally, different LLMs exhibit varying alignment characteristics, suggesting that achieving both natural interaction and technical accuracy requires a multi-agent approach. These challenges highlight the need for innovative approaches to adapt LLMs for specialized domains while maintaining accuracy and user engagement. In this paper, we develop a hierarchical multi-agent framework that utilizes semantic routing and programmatic reasoning to decouple natural language understanding from building analytics. Instead of standard RAG approaches, our system employs a "Doorman" mechanism for task decomposition and specialized coding agents that generate executable Python scripts for precise arithmetic. We validate this framework on a dataset from more than 200 commercial buildings. Results demonstrate the effectiveness in providing accurate and contextual responses for diverse users, including stakeholders, from tenants to building managers, across various building system applications.

Problem

Research questions and friction points this paper is trying to address.

Human-Building Interaction

Large Language Model

Zero-Shot Learning

Multi-Agent System

Domain Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-Shot

Multi-Agent Framework

Programmatic Reasoning