🤖 AI Summary
High-quality, publicly available datasets for fine-grained, multi-class door detection in architectural floor plans are scarce, hindering progress in building compliance verification and indoor scene understanding.
Method: This paper proposes a semi-automatic data construction framework integrating object detection with large language models (LLMs). First, state-of-the-art object detectors precisely localize door instances. Second, a multimodal prompting strategy guides an LLM to perform fine-grained classification—leveraging both visual features and contextual semantics. Third, a lightweight human-in-the-loop verification module ensures label accuracy and consistency.
Contribution/Results: The framework reduces manual annotation effort by ~60% while improving class consistency and structural annotation precision. We release DoorPlan—the first open-source, large-scale, fine-grained door detection dataset featuring eight functional door categories—establishing a new benchmark for regulatory compliance checking, indoor scene parsing, and training downstream neural networks.
📝 Abstract
Accurate detection and classification of diverse door types in floor plans drawings is critical for multiple applications, such as building compliance checking, and indoor scene understanding. Despite their importance, publicly available datasets specifically designed for fine-grained multi-class door detection remain scarce. In this work, we present a semi-automated pipeline that leverages a state-of-the-art object detector and a large language model (LLM) to construct a multi-class door detection dataset with minimal manual effort. Doors are first detected as a unified category using a deep object detection model. Next, an LLM classifies each detected instance based on its visual and contextual features. Finally, a human-in-the-loop stage ensures high-quality labels and bounding boxes. Our method significantly reduces annotation cost while producing a dataset suitable for benchmarking neural models in floor plan analysis. This work demonstrates the potential of combining deep learning and multimodal reasoning for efficient dataset construction in complex real-world domains.