Will AI also replace inspectors? Investigating the potential of generative AIs in usability inspection

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether generative AI can replace human experts in conducting usability inspections and evaluates its capability to identify interface interaction defects. Using GPT-4o and Gemini 2.5 Flash, we conducted parallel usability inspections alongside human experts on real-world software interface tasks, quantitatively comparing performance via precision, recall, and F1-score. Results indicate that generative AI cannot yet independently substitute for human experts—its recall is significantly lower—yet it detects approximately 23% of novel defects missed by humans. Human–AI collaboration increases overall defect coverage by 37%, confirming complementary strengths. The primary contribution is the first empirical demonstration of generative AI’s “incremental defect detection value” in usability engineering, alongside a proposed evaluation framework that prioritizes synergistic human–AI collaboration over replacement. This paradigm shift emphasizes augmentation rather than automation in usability assurance.

Technology Category

Application Category

📝 Abstract

Usability inspection is a well-established technique for identifying interaction issues in software interfaces, thereby contributing to improved product quality. However, it is a costly process that requires time and specialized knowledge from inspectors. With advances in Artificial Intelligence (AI), new opportunities have emerged to support this task, particularly through generative models capable of interpreting interfaces and performing inspections more efficiently. This study examines the performance of generative AIs in identifying usability problems, comparing them to those of experienced human inspectors. A software prototype was evaluated by four specialists and two AI models (GPT-4o and Gemini 2.5 Flash), using metrics such as precision, recall, and F1-score. While inspectors achieved the highest levels of precision and overall coverage, the AIs demonstrated high individual performance and discovered many novel defects, but with a higher rate of false positives and redundant reports. The combination of AIs and human inspectors produced the best results, revealing their complementarity. These findings suggest that AI, in its current stage, cannot replace human inspectors but can serve as a valuable augmentation tool to improve efficiency and expand defect coverage. The results provide evidence based on quantitative analysis to inform the discussion on the role of AI in usability inspections, pointing to viable paths for its complementary use in software quality assessment contexts.

Problem

Research questions and friction points this paper is trying to address.

Investigating AI potential in replacing human usability inspectors

Comparing generative AI performance with human inspection accuracy

Exploring complementary use of AI and humans in defect detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI models interpret interfaces automatically

AI-human combination achieves best inspection results

AI augments human inspectors for defect coverage

🔎 Similar Papers

No similar papers found.

Authors to Follow