🤖 AI Summary
While autonomous software agents have enhanced development efficiency, their errors and novel failure modes necessitate effective human oversight—an area lacking empirical investigation into how developers actually conduct such supervision. This study addresses this gap through semi-structured interviews with 17 experienced developers, integrating theories from human–AI collaboration and software engineering. It identifies four emergent forms of supervision: proactive control, collaborative planning, real-time monitoring, and post-hoc review—challenging the traditional view of supervision as merely reactive and revealing its inherently preventive nature. The work further distills practical heuristics, including verifying code correctness through test outcomes, offering crucial empirical insights and design implications for human-centered agent development and software engineering practice.
📝 Abstract
Autonomous software agents hold promise to increase developer productivity but make mistakes and exhibit novel failure modes, making human oversight central to successful human-agent collaboration. Existing research on agent oversight is largely conceptual; normative frameworks exist, but how users actually oversee agents is less known. In this paper, we bridge this gap by providing early empirical anchors for the theoretical discourse on agent oversight. Drawing on interviews with 17 experienced developers, we conduct an exploratory inquiry examining what forms of emergent oversight work developers perform, when, and how. We also document the oversight challenges developers face and the strategies they have started using to address them. We found at least four forms of emergent oversight work: a priori control, co-planning, real-time monitoring, and post hoc review. We show that oversight work is not only reactive and retrospective, as portrayed in existing research, but also preventative and proactive. We describe situated oversight challenges (e.g., difficulty reviewing agent-generated code) and outline heuristics developers adopt to address such challenges (e.g., using test results as guarantees for code correctness). We conclude with high-level takeaways, future research directions, implications for the human-centered design of software agents and for software engineering practice, and limitations of our research.