Supporting architecture evaluation for ATAM scenarios with LLMs

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the labor-intensive, time-consuming, and subjective nature of quality attribute scenario selection and trade-off analysis in Architecture Tradeoff Analysis Method (ATAM) evaluations. It pioneers the integration of a commercial large language model (LLM)—Microsoft Copilot—into pedagogical ATAM practice to support risk identification, sensitivity point analysis, and cross-attribute trade-off reasoning. Methodologically, a structured prompt engineering framework is developed to formalize key ATAM process steps into LLM-executable tasks. Empirical evaluation across multiple case studies demonstrates that the LLM-generated outputs—covering risks, sensitivity points, and trade-off assessments—achieve higher accuracy than initial human assessments and substantially reduce evaluation turnaround time. The primary contributions are: (1) empirical validation of commercial LLMs’ efficacy and feasibility in architectural quality scenario analysis; and (2) provision of a reusable technical pathway and evidence-based foundation for advancing both ATAM automation and architecture education.

Technology Category

Application Category

📝 Abstract

Architecture evaluation methods have long been used to evaluate software designs. Several evaluation methods have been proposed and used to analyze tradeoffs between different quality attributes. Having competing qualities leads to conflicts for selecting which quality-attribute scenarios are the most suitable ones that an architecture should tackle and for prioritizing the scenarios required by the stakeholders. In this context, architecture evaluation is carried out manually, often involving long brainstorming sessions to decide which are the most adequate quality scenarios. To reduce this effort and make the assessment and selection of scenarios more efficient, we suggest the usage of LLMs to partially automate evaluation activities. As a first step to validate this hypothesis, this work studies MS Copilot as an LLM tool to analyze quality scenarios suggested by students in a software architecture course and compares the students' results with the assessment provided by the LLM. Our initial study reveals that the LLM produces in most cases better and more accurate results regarding the risks, sensitivity points and tradeoff analysis of the quality scenarios. Overall, the use of generative AI has the potential to partially automate and support the architecture evaluation tasks, improving the human decision-making process.

Problem

Research questions and friction points this paper is trying to address.

Automating architecture evaluation using LLMs for efficiency

Analyzing tradeoffs between quality attributes in software designs

Comparing human and LLM assessments of quality scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to automate architecture evaluation

Comparing student and LLM scenario assessments

Generative AI improves decision-making in evaluations

🔎 Similar Papers

No similar papers found.

Authors to Follow