Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
🤖 AI Summary
This work addresses the challenge of frequent GUI layout defects in multi-window mobile scenarios—such as split-screen and foldable devices—where existing detection methods struggle with dynamic layout changes. The paper proposes the first proactive, multi-window–aware GUI defect detection framework, which actively triggers multi-window states during automated exploration, leverages Set-of-Mark for fine-grained UI element alignment, and employs a multimodal large language model with chain-of-thought prompting to detect, localize, and explain layout defects. Evaluated on 50 real-world apps, the approach improves text truncation defect exposure by 184%, achieves an application-level false positive rate (FPR) of 10.00% and false negative rate (FNR) of 11.11%, and attains an F1 score of 87.2% for component occlusion detection, significantly outperforming baseline methods such as OwlEye and YOLO.

Technology Category

Application Category

📝 Abstract
Multi-window mobile scenarios, such as split-screen and foldable modes, make GUI display defects more likely by forcing applications to adapt to changing window sizes and dynamic layout reflow. Existing detection techniques are limited in two ways: they are largely passive, analyzing screenshots only after problematic states have been reached, and they are mainly designed for conventional full-screen interfaces, making them less effective in multi-window settings.We propose an end-to-end framework for GUI display defect detection in multi-window mobile scenarios. The framework proactively triggers split-screen, foldable, and window-transition states during app exploration, uses Set-of-Mark (SoM) to align screenshots with widget-level interface elements, and leverages multimodal large language models with chain-of-thought prompting to detect, localize, and explain display defects. We also construct a benchmark of GUI display defects using 50 real-world Android applications.Experimental results show that multi-window settings substantially increase the exposure of layout-related defects, with text truncation increasing by 184% compared with conventional full-screen settings. At the application level, our method detects 40 defect-prone apps with a false positive rate of 10.00% and a false negative rate of 11.11%, outperforming OwlEye and YOLO-based baselines. At the fine-grained level, it achieves the best F1 score of 87.2% for widget occlusion detection.
Problem

Research questions and friction points this paper is trying to address.

GUI defects
multi-window scenarios
display defects
mobile applications
layout reflow
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal reasoning
proactive GUI testing
multi-window mobile scenarios
Set-of-Mark
display defect detection
🔎 Similar Papers
X
Xinyao Zhang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
R
Rui Wang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
J
Jinhao Cui
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
H
Haotian Huang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
W
Wei Xue
Dongfeng Motor Corporation
W
Wenhua Hu
School of Computer Science and Artificial Intelligence, Engineering Research Center of Transportation Information and Safety (ERCTIS), MoE of China, Wuhan University of Technology
Jianwen Xiang
Jianwen Xiang
Wuhan University of Technology
Dependable ComputingSoftware EngineeringFormal MethodsKnolwedge Management
R
Rui Hao
School of Computer Science and Artificial Intelligence, Engineering Research Center of Transportation Information and Safety (ERCTIS), MoE of China, Wuhan University of Technology