đ€ AI Summary
This study addresses the lack of systematic synthesis on how large language model (LLM) assistants impact developer productivity. We conduct a rigorous literature review grounded in the SPACE frameworkâencompassing Satisfaction, Performance, Activity, Communication, and Environmentâanalyzing 37 empirical studies published between 2014 and 2024. Our analysis reveals, for the first time, LLM assistantsâ dual effects: they significantly enhance coding, testing, and debugging efficiency, yet may impair team communication and collaboration. We find that while 92% of studies adopt multi-dimensional evaluation, only 14% cover three or more SPACE dimensions; moreover, most are short-term, individual-level experiments, with scant validation in longitudinal, team-based settings. This work bridges a critical gap in cross-dimensional integrative analysis. To support reproducibility and future research, we publicly release our annotated dataset and coding schemeâestablishing both a theoretical benchmark and methodological paradigm for human-AI collaborative software engineering.
đ Abstract
Large language model assistants (LLM-assistants) present new opportunities to transform software development. Developers are increasingly adopting these tools across tasks, including coding, testing, debugging, documentation, and design. Yet, despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity. In this paper, we present a systematic literature review of 37 peer-reviewed studies published between January 2014 and December 2024 that examine this impact. Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks. Commonly reported gains include minimized code search, accelerated development, and the automation of trivial and repetitive tasks. However, studies also highlight concerns around cognitive offloading, reduced team collaboration, and inconsistent effects on code quality. While the majority of studies (92%) adopt a multi-dimensional perspective by examining at least two SPACE dimensions, reflecting increased awareness of the complexity of developer productivity, only 14% extend beyond three dimensions, indicating substantial room for more integrated evaluations. Satisfaction, Performance, and Efficiency are the most frequently investigated dimensions, whereas Communication and Activity remain underexplored. Most studies are exploratory (64%) and methodologically diverse, but lack longitudinal and team-based evaluations. This review surfaces key research gaps and provides recommendations for future research and practice. All artifacts associated with this study are publicly available at https://zenodo.org/records/15788502.