Application of CARE-SD text classifier tools to assess distribution of stigmatizing and doubt-marking language features in EHR

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the distribution of stigmatizing and suspicious language in electronic health records (EHRs) across patient subgroups—including race, insurance status, and psychiatric history—as well as across clinical provider types, uncovering structural sources of textual bias. Methodologically, it innovatively integrates extended lexicon matching, the CARE-SD text classification tool, supervised learning classifiers, and Poisson regression to systematically quantify linguistic patterns across diverse providers (e.g., nurses, social workers, physicians). Results reveal significantly higher prevalence of stigmatizing language toward Black, low-income, and psychiatric patients; increased use of suspicion-laden phrasing for male patients; and markedly elevated rates among nurses and social workers compared to physicians. The study contributes a reproducible methodological framework and empirical evidence to detect and mitigate implicit bias embedded in EHR documentation, thereby advancing equity-aware clinical informatics.

Technology Category

Application Category

📝 Abstract
Introduction: Electronic health records (EHR) are a critical medium through which patient stigmatization is perpetuated among healthcare teams. Methods: We identified linguistic features of doubt markers and stigmatizing labels in MIMIC-III EHR via expanded lexicon matching and supervised learning classifiers. Predictors of rates of linguistic features were assessed using Poisson regression models. Results: We found higher rates of stigmatizing labels per chart among patients who were Black or African American (RR: 1.16), patients with Medicare/Medicaid or government-run insurance (RR: 2.46), self-pay (RR: 2.12), and patients with a variety of stigmatizing disease and mental health conditions. Patterns among doubt markers were similar, though male patients had higher rates of doubt markers (RR: 1.25). We found increased stigmatizing labels used by nurses (RR: 1.40), and social workers (RR: 2.25), with similar patterns of doubt markers. Discussion: Stigmatizing language occurred at higher rates among historically stigmatized patients, perpetuated by multiple provider types.
Problem

Research questions and friction points this paper is trying to address.

Analyzing stigmatizing language in EHRs across patient demographics
Identifying linguistic predictors of doubt markers in medical records
Assessing provider-type disparities in stigmatizing language usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used CARE-SD text classifier tools
Applied expanded lexicon matching
Employed supervised learning classifiers
🔎 Similar Papers
No similar papers found.
D
Drew Walker
Department of Health Systems Science, Kaiser Permanente School of Medicine, Pasadena, CA, USA
J
Jennifer Love
Department of Emergency Medicine, Mount Sinai, New York, NY, USA
S
Swati Rajwal
Department of Computer Science, Emory College of Arts and Sciences, Emory University, Atlanta, GA, USA
I
Isabel C Walker
Children’s Heart Center, Children’s Healthcare of Atlanta, Atlanta GA, USA
H
Hannah LF Cooper
Department of Behavioral, Social, Health Education Sciences, Rollins School of Public Health, Emory University, Atlanta GA, USA
Abeed Sarker
Abeed Sarker
Emory University School of Medicine
Natural Language ProcessingBiomedical InformaticsHealth Data ScienceApplied Machine Learning
M
Melvin Livingston III
Department of Behavioral, Social, Health Education Sciences, Rollins School of Public Health, Emory University, Atlanta GA, USA