π€ AI Summary
This study evaluates the alignment of large language models (LLMs) with individual cliniciansβ preferences in drafting responses to patient portal messages, aiming to assess their potential to reduce editorial burden. To this end, we introduce the first topic-level alignment evaluation framework tailored to individual clinician preferences, construct an expert-annotated dataset of thematic elements, and implement adaptation strategies including topic-aware prompting, retrieval-augmented generation, supervised fine-tuning, and direct preference optimization. Results demonstrate that LLMs perform well across most topics but struggle with critical ones requiring proactive questioning. Crucially, topic-driven adaptation strategies substantially improve response quality, underscoring the necessity and efficacy of fine-grained alignment assessment in clinical language generation.
π Abstract
Large language models (LLMs) show promise in drafting responses to patient portal messages, yet their integration into clinical workflows raises various concerns, including whether they would actually save clinicians time and effort in their portal workload. We investigate LLM alignment with individual clinicians through a comprehensive evaluation of the patient message response drafting task. We develop a novel taxonomy of thematic elements in clinician responses and propose a novel evaluation framework for assessing clinician editing load of LLM-drafted responses at both content and theme levels. We release an expert-annotated dataset and conduct large-scale evaluations of local and commercial LLMs using various adaptation techniques including thematic prompting, retrieval-augmented generation, supervised fine-tuning, and direct preference optimization. Our results reveal substantial epistemic uncertainty in aligning LLM drafts with clinician responses. While LLMs demonstrate capability in drafting certain thematic elements, they struggle with clinician-aligned generation in other themes, particularly question asking to elicit further information from patients. Theme-driven adaptation strategies yield improvements across most themes. Our findings underscore the necessity of adapting LLMs to individual clinician preferences to enable reliable and responsible use in patient-clinician communication workflows.