A Survey on Large Language Model Impact on Software Evolvability and Maintainability: the Good, the Bad, the Ugly, and the Remedy

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study systematically investigates the dual impact of large language models (LLMs) on software maintainability and evolvability. Drawing on a systematic literature review of 87 studies published between 2020 and 2024, and employing a hybrid analytical approach that combines LLM-assisted synthesis with human validation, the work introduces a novel three-dimensional “benefits–risks–mitigations” framework. The findings reveal that LLMs significantly enhance code analyzability, testability, and debugging support; however, their susceptibility to hallucinations and contextual fragility poses notable threats to the long-term sustainability of software systems. This research provides both theoretical grounding and practical guidance for understanding the nuanced, dual role of LLMs in software evolution.

Technology Category

Application Category

📝 Abstract

Context. Large Language Models (LLMs) are increasingly embedded in software engineering workflows for tasks including code generation, summarization, repair, and testing. Empirical studies report productivity gains, improved comprehension, and reduced cognitive load. However, evidence remains fragmented, and concerns persist about hallucinations, unstable outputs, methodological limitations, and emerging forms of technical debt. How these mixed effects shape long-term software maintainability and evolvability remains unclear. Objectives. This study systematically examines how LLMs influence the maintainability and evolvability of software systems. We identify which quality attributes are addressed in existing research, the positive impacts LLMs provide, the risks and weaknesses they introduce, and the mitigation strategies proposed in the literature. Method. We conducted a systematic literature review. Searches across ACM DL, IEEE Xplore, and Scopus (2020 to 2024) yielded 87 primary studies. Qualitative evidence was extracted through a calibrated multi-researcher process. Attributes were analyzed descriptively, while impacts, risks, weaknesses, and mitigation strategies were synthesized using a hybrid thematic approach supported by an LLM-assisted analysis tool with human-in-the-loop validation. Results. LLMs provide benefits such as improved analyzability, testability, code comprehension, debugging support, and automated repair. However, they also introduce risks, including hallucinated or incorrect outputs, brittleness to context, limited domain reasoning, unstable performance, and flaws in current evaluations, which threaten long-term evolvability. Conclusion. LLMs can strengthen maintainability and evolvability, but they also pose nontrivial risks to long-term sustainability. Responsible adoption requires safeguards, rigorous evaluation, and structured human oversight.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Software Evolvability

Software Maintainability

Technical Debt

Hallucination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Software Maintainability

Software Evolvability

Systematic Literature Review

Human-in-the-Loop

🔎 Similar Papers

No similar papers found.

Authors to Follow