π€ AI Summary
This work addresses the limited prospective empathic capacity of large language models in healthcare settings, where existing approaches predominantly focus on post-hoc empathy recognition. The authors propose the Empathy Applicability Framework (EAF), which leverages clinical, contextual, and linguistic cues to prospectively classify the need for empathy in patient health inquiries. They construct the first benchmark dataset for empathy applicability in general health queries, annotated by both human experts and GPT-4o, and train a supervised classifier accordingly. Experimental results demonstrate that the classifier significantly outperforms heuristic and zero-shot baselines, achieving strong performance on consensus subsets. Error analysis highlights key challenges, including the detection of implicit distress and ambiguity in clinical severity. This study presents the first model of humanβAI collaborative empathy judgment and demonstrates high agreement between human and AI assessments.
π Abstract
LLMs are increasingly being integrated into clinical workflows, yet they often lack clinical empathy, an essential aspect of effective doctor-patient communication. Existing NLP frameworks focus on reactively labeling empathy in doctors'responses but offer limited support for anticipatory modeling of empathy needs, especially in general health queries. We introduce the Empathy Applicability Framework (EAF), a theory-driven approach that classifies patient queries in terms of the applicability of emotional reactions and interpretations, based on clinical, contextual, and linguistic cues. We release a benchmark of real patient queries, dual-annotated by Humans and GPT-4o. In the subset with human consensus, we also observe substantial human-GPT alignment. To validate EAF, we train classifiers on human-labeled and GPT-only annotations to predict empathy applicability, achieving strong performance and outperforming the heuristic and zero-shot LLM baselines. Error analysis highlights persistent challenges: implicit distress, clinical-severity ambiguity, and contextual hardship, underscoring the need for multi-annotator modeling, clinician-in-the-loop calibration, and culturally diverse annotation. EAF provides a framework for identifying empathy needs before response generation, establishes a benchmark for anticipatory empathy modeling, and enables supporting empathetic communication in asynchronous healthcare.