Streamlining Acceptance Test Generation for Mobile Applications Through Large Language Models: An Industrial Case Study

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Mobile application acceptance testing remains hindered by high generation and maintenance costs, especially in cross-platform frameworks like Flutter. This paper introduces AToMIC, the first framework to systematically apply customized large language models (LLMs) for end-to-end automated generation of industrial-grade mobile acceptance tests. Given JIRA requirements and code changes, AToMIC jointly generates Gherkin scenarios, Page Object classes, and executable Flutter UI test scripts. Its core innovation lies in integrating requirement semantics understanding, code-difference awareness, and domain-specific syntactic constraints to produce high-fidelity, maintainable test artifacts. Evaluated on 13 real-world features of the BMW MyBMW app, AToMIC achieves an average generation time of five minutes per test, 93.3% Gherkin syntax correctness, 78.8% Page Object usability without modification, and 100% UI test execution success rate—demonstrating substantial improvements in test development efficiency and agility.

Technology Category

Application Category

📝 Abstract

Mobile acceptance testing remains a bottleneck in modern software development, particularly for cross-platform mobile development using frameworks like Flutter. While developers increasingly rely on automated testing tools, creating and maintaining acceptance test artifacts still demands significant manual effort. To help tackle this issue, we introduce AToMIC, an automated framework leveraging specialized Large Language Models to generate Gherkin scenarios, Page Objects, and executable UI test scripts directly from requirements (JIRA tickets) and recent code changes. Applied to BMW's MyBMW app, covering 13 real-world issues in a 170+ screen codebase, AToMIC produced executable test artifacts in under five minutes per feature on standard hardware. The generated artifacts were of high quality: 93.3% of Gherkin scenarios were syntactically correct upon generation, 78.8% of PageObjects ran without manual edits, and 100% of generated UI tests executed successfully. In a survey, all practitioners reported time savings (often a full developer-day per feature) and strong confidence in adopting the approach. These results confirm AToMIC as a scalable, practical solution for streamlining acceptance test creation and maintenance in industrial mobile projects.

Problem

Research questions and friction points this paper is trying to address.

Automating mobile acceptance test generation from requirements

Reducing manual effort in creating Flutter cross-platform tests

Generating executable UI tests directly from JIRA tickets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging Large Language Models for automated test generation

Generating Gherkin scenarios and UI tests from requirements

Producing executable test artifacts in under five minutes

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation