🤖 AI Summary
Despite their promise, large language models (LLMs) face a critical “algorithm-to-application” gap in clinical deployment, hindering real-world integration into electronic health record (EHR) systems. Method: Drawing on empirical EHR deployment experience, we propose the first systematic framework for implementing generative AI agents in clinical settings—centered on sociotechnical implementation tasks, which constitute over 80% of deployment effort. The framework addresses five core challenges: EHR data integration, clinical validation of model trustworthiness, economic sustainability, adaptive management of model and system drift, and multi-stakeholder governance. It synergistically integrates LLMs, prompt engineering, FHIR-compliant EHR interfaces, continuous performance monitoring, and governance protocols. Contribution/Results: We deployed irAE-Agent—a clinical AI agent for automated identification of immune-related adverse events—demonstrating feasibility and robustness. Evaluation by 20 clinical and technical experts confirms the framework significantly enhances translatability from pilot to routine clinical service.
📝 Abstract
Large language models (LLMs) integrated into agent-driven workflows hold immense promise for healthcare, yet a significant gap exists between their potential and practical implementation within clinical settings. To address this, we present a practitioner-oriented field manual for deploying generative agents that use electronic health record (EHR) data. This guide is informed by our experience deploying the "irAE-Agent", an automated system to detect immune-related adverse events from clinical notes at Mass General Brigham, and by structured interviews with 20 clinicians, engineers, and informatics leaders involved in the project. Our analysis reveals a critical misalignment in clinical AI development: less than 20% of our effort was dedicated to prompt engineering and model development, while over 80% was consumed by the sociotechnical work of implementation. We distill this effort into five "heavy lifts": data integration, model validation, ensuring economic value, managing system drift, and governance. By providing actionable solutions for each of these challenges, this field manual shifts the focus from algorithmic development to the essential infrastructure and implementation work required to bridge the "valley of death" and successfully translate generative AI from pilot projects into routine clinical care.