🤖 AI Summary
Existing methods struggle to accurately and scalably detect user-perceivable GUI response latency in industrial settings. This paper proposes a lightweight black-box video analysis approach that leverages computer vision to identify frame-level temporal offsets between user interaction events (e.g., taps, swipes) and corresponding UI feedback, enabling fine-grained quantification of both response time and feedback stabilization time. The method operates directly on screencasts—requiring no source code access or system-level instrumentation—and is designed for seamless integration into large-scale automated testing pipelines. Evaluated on 2,458 real-world interaction samples, it achieves 96% precision and 93% recall in interaction detection, with >90% of response time estimates within ±50 ms and completion time estimates within ±100 ms. Deployed in an industrial testing pipeline, the approach effectively bridges critical gaps left by static analysis and low-level system metrics in modeling end-user experience.
📝 Abstract
GUI responsiveness is critical for a positive user experience in mobile applications. Even brief delays in visual feedback can frustrate users and lead to negative reviews. However, detecting and quantifying such user-perceived delays remains challenging, especially in industrial testing pipelines that evaluate thousands of apps daily across diverse devices and OS versions. Existing techniques based on static analysis or system metrics, while useful, may not accurately capture user-perceived issues or scale effectively.
In this experience paper, we present ool, a lightweight and black-box technique that measures GUI responsiveness directly from mobile screencasts -- video recordings captured during automated GUI testing. ool detects user interactions and visual delays, helping developers identify GUI performance issues that affect the user experience. It uses computer vision to detect user interactions and analyzes frame-level visual changes to compute two key metrics: response time (from user action to first visual feedback) and finish time (until visual feedback stabilizes). We evaluate ool on a manually annotated benchmark of 2,458 interactions from 64 popular Android apps. ool achieves 0.96 precision and 0.93 recall in detecting interactions, and measures response and finish times within 50,ms and 100,ms error, respectively, for over 89% of interactions. The tool has been deployed in an industrial testing pipeline and analyzes thousands of screencasts daily, uncovering responsiveness issues missed by traditional tools and improving performance debugging efficiency.