🤖 AI Summary
This study systematically investigates the dual impact of large language models (LLMs) on software maintainability and evolvability. Drawing on a systematic literature review of 87 studies published between 2020 and 2024, and employing a hybrid analytical approach that combines LLM-assisted synthesis with human validation, the work introduces a novel three-dimensional “benefits–risks–mitigations” framework. The findings reveal that LLMs significantly enhance code analyzability, testability, and debugging support; however, their susceptibility to hallucinations and contextual fragility poses notable threats to the long-term sustainability of software systems. This research provides both theoretical grounding and practical guidance for understanding the nuanced, dual role of LLMs in software evolution.
📝 Abstract
Context. Large Language Models (LLMs) are increasingly embedded in software engineering workflows for tasks including code generation, summarization, repair, and testing. Empirical studies report productivity gains, improved comprehension, and reduced cognitive load. However, evidence remains fragmented, and concerns persist about hallucinations, unstable outputs, methodological limitations, and emerging forms of technical debt. How these mixed effects shape long-term software maintainability and evolvability remains unclear. Objectives. This study systematically examines how LLMs influence the maintainability and evolvability of software systems. We identify which quality attributes are addressed in existing research, the positive impacts LLMs provide, the risks and weaknesses they introduce, and the mitigation strategies proposed in the literature. Method. We conducted a systematic literature review. Searches across ACM DL, IEEE Xplore, and Scopus (2020 to 2024) yielded 87 primary studies. Qualitative evidence was extracted through a calibrated multi-researcher process. Attributes were analyzed descriptively, while impacts, risks, weaknesses, and mitigation strategies were synthesized using a hybrid thematic approach supported by an LLM-assisted analysis tool with human-in-the-loop validation. Results. LLMs provide benefits such as improved analyzability, testability, code comprehension, debugging support, and automated repair. However, they also introduce risks, including hallucinated or incorrect outputs, brittleness to context, limited domain reasoning, unstable performance, and flaws in current evaluations, which threaten long-term evolvability. Conclusion. LLMs can strengthen maintainability and evolvability, but they also pose nontrivial risks to long-term sustainability. Responsible adoption requires safeguards, rigorous evaluation, and structured human oversight.