Does GenAI Make Usability Testing Obsolete?

📅 2024-11-01

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Traditional usability testing is resource-intensive, posing challenges for small development teams seeking early detection of usability issues in iOS mobile applications. Method: We propose UX-LLM, a novel tool that integrates multimodal large vision-language models (VLMs) into mobile UI analysis—jointly processing interface screenshots and source code to enable code-aware, navigation-path-agnostic identification of subtle usability defects. Contribution/Results: Evaluated via expert assessment and focus groups on two medium-complexity open-source iOS apps, UX-LLM achieves 61–66% precision and 35–38% recall, uncovering previously undetected usability issues. Rather than replacing conventional usability testing, UX-LLM serves as a lightweight, early-stage, and interpretable complement—enhancing efficiency and accessibility of usability assurance without requiring extensive human effort or domain-specific test scripts.

Technology Category

Application Category

📝 Abstract

Ensuring usability is crucial for the success of mobile apps. Usability issues can compromise user experience and negatively impact the perceived app quality. This paper presents UX-LLM, a novel tool powered by a Large Vision-Language Model that predicts usability issues in iOS apps. To evaluate the performance of UX-LLM we predicted usability issues in two open-source apps of a medium complexity and asked usability experts to assess the predictions. We also performed traditional usability testing and expert review for both apps and compared the results to those of UX-LLM. UX-LLM demonstrated precision ranging from 0.61 and 0.66 and recall between 0.35 and 0.38, indicating its ability to identify valid usability issues, yet failing to capture the majority of issues. Finally, we conducted a focus group with an app development team of a capstone project developing a transit app for visually impaired persons. The focus group expressed positive perceptions of UX-LLM as it identified unknown usability issues in their app. However, they also raised concerns about its integration into the development workflow, suggesting potential improvements. Our results show that UX-LLM cannot fully replace traditional usability evaluation methods but serves as a valuable supplement particularly for small teams with limited resources, to identify issues in less common user paths, due to its ability to inspect the source code.

Problem

Research questions and friction points this paper is trying to address.

Predicting usability issues in iOS apps using AI

Comparing AI tool performance with traditional usability testing

Assessing AI integration in development workflows for small teams

Innovation

Methods, ideas, or system contributions that make the work stand out.

UX-LLM uses Large Vision-Language Model

Predicts usability issues in iOS apps

Analyzes source code for uncommon paths

🔎 Similar Papers

The Future of Software Testing: AI-Powered Test Case Generation and Validation