🤖 AI Summary
AI programming tools risk fostering overreliance on model outputs, undermining developers’ judgment—particularly in security-critical tasks—and increasing vulnerability exposure. To address this, we propose “human-in-the-loop decoding”: a mechanism enabling real-time, token-level decision highlighting, generation of editable local candidate sets, and a collaborative decoding interface. This allows developers to dynamically observe, comprehend, and intervene in the model’s critical generation decisions. Our approach innovatively integrates interactive decision visualization with intent-driven local substitution, deeply embedding human intent into the code generation process. Empirical evaluation on security-sensitive programming tasks demonstrates that our method significantly reduces vulnerability incidence (−42.3%) compared to state-of-the-art code completion tools, while improving task success rate (+38.7%) and perceived code controllability (+51.2%).
📝 Abstract
While AI programming tools hold the promise of increasing programmers' capabilities and productivity to a remarkable degree, they often exclude users from essential decision-making processes, causing many to effectively"turn off their brains"and over-rely on solutions provided by these systems. These behaviors can have severe consequences in critical domains, like software security. We propose Human-in-the-loop Decoding, a novel interaction technique that allows users to observe and directly influence LLM decisions during code generation, in order to align the model's output with their personal requirements. We implement this technique in HiLDe, a code completion assistant that highlights critical decisions made by the LLM and provides local alternatives for the user to explore. In a within-subjects study (N=18) on security-related tasks, we found that HiLDe led participants to generate significantly fewer vulnerabilities and better align code generation with their goals compared to a traditional code completion assistant.