Developers' Experience with Generative AI -- First Insights from an Empirical Mixed-Methods Field Study

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates how professional developers’ interaction patterns with generative AI tools—specifically in-code suggestions, chat-based prompting, and hybrid usage—affect coding efficiency, accuracy, and cognitive load in enterprise settings. Employing a novel embedded mixed-methods design, it integrates controlled experiments with naturalistic workplace observation, synchronously collecting behavioral logs, NASA-TLX subjective workload ratings, and task completion metrics. The work is the first to systematically uncover mechanisms underlying consistency between subjective experience and objective behavior. Results reveal that moderate AI interaction significantly improves efficiency and reduces cognitive load, whereas excessive or hybrid usage incurs diminishing returns; chat-based prompting notably enhances task accuracy. Crucially, the findings demonstrate that routine GenAI adoption coincides with concurrent increases in both cognitive load and productivity—challenging simplistic assumptions of unidirectional benefit. These insights provide empirically grounded design principles and optimization pathways for human-AI collaboration in software engineering.

Technology Category

Application Category

📝 Abstract

With the rise of AI-powered coding assistants, firms and programmers are exploring how to optimize their interaction with them. Research has so far mainly focused on evaluating output quality and productivity gains, leaving aside the developers' experience during the interaction. In this study, we take a multimodal, developer-centered approach to gain insights into how professional developers experience the interaction with Generative AI (GenAI) in their natural work environment in a firm. The aim of this paper is (1) to demonstrate a feasible mixed-method study design with controlled and uncontrolled study periods within a firm setting, (2) to give first insights from complementary behavioral and subjective experience data on developers' interaction with GitHub Copilot and (3) to compare the impact of interaction types (no Copilot use, in-code suggestions, chat prompts or both in-code suggestions and chat prompts) on efficiency, accuracy and perceived workload whilst working on different task categories. Results of the controlled sessions in this study indicate that moderate use of either in-code suggestions or chat prompts improves efficiency (task duration) and reduces perceived workload compared to not using Copilot, while excessive or combined use lessens these benefits. Accuracy (task completion) profits from chat interaction. In general, subjective perception of workload aligns with objective behavioral data in this study. During the uncontrolled period of the study, both higher cognitive load and productivity were perceived when interacting with AI during everyday working tasks. This study motivates the use of comparable study designs, in e.g. workshop or hackathon settings, to evaluate GenAI tools holistically and realistically with a focus on the developers' experience.

Problem

Research questions and friction points this paper is trying to address.

Investigates developers' experience with Generative AI in work environments

Compares interaction types' effects on efficiency, accuracy, and workload

Proposes a mixed-method design to holistically evaluate GenAI tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-method study design in firm setting

Behavioral and subjective data on GitHub Copilot

Comparing interaction types for efficiency and workload

🔎 Similar Papers

No similar papers found.

Authors to Follow