How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting

πŸ“… 2026-01-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study evaluates the alignment of large language models (LLMs) with individual clinicians’ preferences in drafting responses to patient portal messages, aiming to assess their potential to reduce editorial burden. To this end, we introduce the first topic-level alignment evaluation framework tailored to individual clinician preferences, construct an expert-annotated dataset of thematic elements, and implement adaptation strategies including topic-aware prompting, retrieval-augmented generation, supervised fine-tuning, and direct preference optimization. Results demonstrate that LLMs perform well across most topics but struggle with critical ones requiring proactive questioning. Crucially, topic-driven adaptation strategies substantially improve response quality, underscoring the necessity and efficacy of fine-grained alignment assessment in clinical language generation.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) show promise in drafting responses to patient portal messages, yet their integration into clinical workflows raises various concerns, including whether they would actually save clinicians time and effort in their portal workload. We investigate LLM alignment with individual clinicians through a comprehensive evaluation of the patient message response drafting task. We develop a novel taxonomy of thematic elements in clinician responses and propose a novel evaluation framework for assessing clinician editing load of LLM-drafted responses at both content and theme levels. We release an expert-annotated dataset and conduct large-scale evaluations of local and commercial LLMs using various adaptation techniques including thematic prompting, retrieval-augmented generation, supervised fine-tuning, and direct preference optimization. Our results reveal substantial epistemic uncertainty in aligning LLM drafts with clinician responses. While LLMs demonstrate capability in drafting certain thematic elements, they struggle with clinician-aligned generation in other themes, particularly question asking to elicit further information from patients. Theme-driven adaptation strategies yield improvements across most themes. Our findings underscore the necessity of adapting LLMs to individual clinician preferences to enable reliable and responsible use in patient-clinician communication workflows.
Problem

Research questions and friction points this paper is trying to address.

LLM alignment
patient message response
clinician editing load
clinical workflow
thematic elements
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM alignment
clinician editing load
thematic prompting
retrieval-augmented generation
expert-annotated dataset
πŸ”Ž Similar Papers
No similar papers found.
Parker Seegmiller
Parker Seegmiller
Dartmouth College
Natural Language ProcessingDeep LearningHealthcare
Joseph Gatto
Joseph Gatto
Dartmouth College
Machine LearningNatural Language Processing
S
Sarah E. Greer
Department of Computer Science, Dartmouth College
G
Ganza Belise Isingizwe
Department of Computer Science, Dartmouth College
R
Rohan Ray
Department of Computer Science, Dartmouth College
T
Timothy Burdick
Department of Community and Family Medicine, Dartmouth Health; The Dartmouth Institute, Dartmouth College
S
S. Preum
Department of Computer Science, Dartmouth College