A Survey on Large Language Model Impact on Software Evolvability and Maintainability: the Good, the Bad, the Ugly, and the Remedy

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the dual impact of large language models (LLMs) on software maintainability and evolvability. Drawing on a systematic literature review of 87 studies published between 2020 and 2024, and employing a hybrid analytical approach that combines LLM-assisted synthesis with human validation, the work introduces a novel three-dimensional “benefits–risks–mitigations” framework. The findings reveal that LLMs significantly enhance code analyzability, testability, and debugging support; however, their susceptibility to hallucinations and contextual fragility poses notable threats to the long-term sustainability of software systems. This research provides both theoretical grounding and practical guidance for understanding the nuanced, dual role of LLMs in software evolution.

Technology Category

Application Category

📝 Abstract
Context. Large Language Models (LLMs) are increasingly embedded in software engineering workflows for tasks including code generation, summarization, repair, and testing. Empirical studies report productivity gains, improved comprehension, and reduced cognitive load. However, evidence remains fragmented, and concerns persist about hallucinations, unstable outputs, methodological limitations, and emerging forms of technical debt. How these mixed effects shape long-term software maintainability and evolvability remains unclear. Objectives. This study systematically examines how LLMs influence the maintainability and evolvability of software systems. We identify which quality attributes are addressed in existing research, the positive impacts LLMs provide, the risks and weaknesses they introduce, and the mitigation strategies proposed in the literature. Method. We conducted a systematic literature review. Searches across ACM DL, IEEE Xplore, and Scopus (2020 to 2024) yielded 87 primary studies. Qualitative evidence was extracted through a calibrated multi-researcher process. Attributes were analyzed descriptively, while impacts, risks, weaknesses, and mitigation strategies were synthesized using a hybrid thematic approach supported by an LLM-assisted analysis tool with human-in-the-loop validation. Results. LLMs provide benefits such as improved analyzability, testability, code comprehension, debugging support, and automated repair. However, they also introduce risks, including hallucinated or incorrect outputs, brittleness to context, limited domain reasoning, unstable performance, and flaws in current evaluations, which threaten long-term evolvability. Conclusion. LLMs can strengthen maintainability and evolvability, but they also pose nontrivial risks to long-term sustainability. Responsible adoption requires safeguards, rigorous evaluation, and structured human oversight.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Software Evolvability
Software Maintainability
Technical Debt
Hallucination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Software Maintainability
Software Evolvability
Systematic Literature Review
Human-in-the-Loop
🔎 Similar Papers
No similar papers found.
B
Bruno Claudino Matias
Computer Science, Virginia Commonwealth University, Richmond, VA, United States
S
S. Freire
Federal Institute of Ceara, CE, Brazil
J
Juliana Freitas
Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, United States
F
Felipe Fronchetti
Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, United States
Kostadin Damevski
Kostadin Damevski
Professor of Computer Science, Virginia Commonwealth University
Software EngineeringMining Software RepositoriesNatural Language Processing
R
Rodrigo O. Spínola
Computer Science, Virginia Commonwealth University, Richmond, VA, United States