🤖 AI Summary
This study addresses the high cost and obsolescence challenges of manually constructing software architecture views, which struggle to keep pace with the evolution of complex systems. It presents the first systematic evaluation of large language models (LLMs) and agent-based approaches for automatically generating architectural views from source code. The evaluation encompasses three LLMs, three prompting strategies—including few-shot prompting—and two agent architectures, applied across 340 open-source projects to produce 4,137 views. Both automated metrics and human assessments were employed for validation. Results indicate that specialized agents achieve the best performance in view clarity (22.6% failure rate) and level of detail (50% success rate). Few-shot prompting reduces clarity failure rates by 9.2% compared to zero-shot prompting; however, existing methods still fall short of reliably producing architecture-level abstractions.
📝 Abstract
Architecture views are essential for software architecture documentation, yet their manual creation is labor intensive and often leads to outdated artifacts. As systems grow in complexity, the automated generation of views from source code becomes increasingly valuable. Goal: We empirically evaluate the ability of LLMs and agentic approaches to generate architecture views from source code. Method: We analyze 340 open-source repositories across 13 experimental configurations using 3 LLMs with 3 prompting techniques and 2 agentic approaches, yielding 4,137 generated views. We evaluate the generated views by comparing them with the ground-truth using a combination of automated metrics complemented by human evaluations. Results: Prompting strategies offer marginal improvements. Few-shot prompting reduces clarity failures by 9.2% compared to zero-shot baselines. The custom agentic approach consistently outperforms the general-purpose agent, achieving the best clarity (22.6% failure rate) and level-of-detail success (50%). Conclusions: LLM and agentic approaches demonstrate capabilities in generating syntactically valid architecture views. However, they consistently exhibit granularity mismatches, operating at the code level rather than architectural abstractions. This suggests that there is still a need for human expertise, positioning LLMs and agents as assistive tools rather than autonomous architects.