Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In construction-site personnel detection, the scarcity of accurately annotated real-world data severely limits deep learning performance. To address this, this paper proposes a synthetic data generation method leveraging Midjourney—a text-to-image generative AI—via systematic prompt engineering to produce 12,000 high-diversity, high-fidelity images with pixel-level annotations. The synthetic dataset is rigorously validated through human verification and downstream deep neural network (DNN) training for object detection. Experimental results demonstrate substantial performance gains: on a real-world construction test set, the model achieves AP@0.5 = 0.937 and AP@0.5:0.95 = 0.642; on the synthetic test set, it attains AP@0.5 = 0.994 and AP@0.5:0.95 = 0.919. This work represents the first large-scale application of Midjourney in industrial vision data synthesis, empirically establishing generative AI’s efficacy—and practical limitations—in mitigating real-data scarcity under low-shot conditions. It delivers a reproducible, scalable data paradigm for AI deployment in construction automation.

Technology Category

Application Category

📝 Abstract
While recent advancements in deep neural networks (DNNs) have substantially enhanced visual AI's capabilities, the challenge of inadequate data diversity and volume remains, particularly in construction domain. This study presents a novel image synthesis methodology tailored for construction worker detection, leveraging the generative-AI platform Midjourney. The approach entails generating a collection of 12,000 synthetic images by formulating 3000 different prompts, with an emphasis on image realism and diversity. These images, after manual labeling, serve as a dataset for DNN training. Evaluation on a real construction image dataset yielded promising results, with the model attaining average precisions (APs) of 0.937 and 0.642 at intersection-over-union (IoU) thresholds of 0.5 and 0.5 to 0.95, respectively. Notably, the model demonstrated near-perfect performance on the synthetic dataset, achieving APs of 0.994 and 0.919 at the two mentioned thresholds. These findings reveal both the potential and weakness of generative AI in addressing DNN training data scarcity.
Problem

Research questions and friction points this paper is trying to address.

Addressing inadequate data diversity in construction worker detection
Generating synthetic images using Midjourney for DNN training
Evaluating model performance on real and synthetic construction datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI platform Midjourney for synthetic images
3000 prompts create 12000 diverse realistic images
Synthetic dataset trains DNN for worker detection
🔎 Similar Papers
No similar papers found.
H
Hongyang Zhao
Department of Civil and Mineral Engineering, University of Toronto, 35 St. George Street, Toronto, ON, Canada
T
Tianyu Liang
Department of Civil and Mineral Engineering, University of Toronto, 35 St. George Street, Toronto, ON, Canada
Sina Davari
Sina Davari
University of Toronto
Computer VisionGenerative AIStructural Engineering
Daeho Kim
Daeho Kim
University of Toronto
Civil Engineering