Application of CARE-SD text classifier tools to assess distribution of stigmatizing and doubt-marking language features in EHR

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the distribution of stigmatizing and suspicious language in electronic health records (EHRs) across patient subgroups—including race, insurance status, and psychiatric history—as well as across clinical provider types, uncovering structural sources of textual bias. Methodologically, it innovatively integrates extended lexicon matching, the CARE-SD text classification tool, supervised learning classifiers, and Poisson regression to systematically quantify linguistic patterns across diverse providers (e.g., nurses, social workers, physicians). Results reveal significantly higher prevalence of stigmatizing language toward Black, low-income, and psychiatric patients; increased use of suspicion-laden phrasing for male patients; and markedly elevated rates among nurses and social workers compared to physicians. The study contributes a reproducible methodological framework and empirical evidence to detect and mitigate implicit bias embedded in EHR documentation, thereby advancing equity-aware clinical informatics.

Technology Category

Application Category

📝 Abstract

Introduction: Electronic health records (EHR) are a critical medium through which patient stigmatization is perpetuated among healthcare teams. Methods: We identified linguistic features of doubt markers and stigmatizing labels in MIMIC-III EHR via expanded lexicon matching and supervised learning classifiers. Predictors of rates of linguistic features were assessed using Poisson regression models. Results: We found higher rates of stigmatizing labels per chart among patients who were Black or African American (RR: 1.16), patients with Medicare/Medicaid or government-run insurance (RR: 2.46), self-pay (RR: 2.12), and patients with a variety of stigmatizing disease and mental health conditions. Patterns among doubt markers were similar, though male patients had higher rates of doubt markers (RR: 1.25). We found increased stigmatizing labels used by nurses (RR: 1.40), and social workers (RR: 2.25), with similar patterns of doubt markers. Discussion: Stigmatizing language occurred at higher rates among historically stigmatized patients, perpetuated by multiple provider types.

Problem

Research questions and friction points this paper is trying to address.

Analyzing stigmatizing language in EHRs across patient demographics

Identifying linguistic predictors of doubt markers in medical records

Assessing provider-type disparities in stigmatizing language usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used CARE-SD text classifier tools

Applied expanded lexicon matching

Employed supervised learning classifiers

🔎 Similar Papers

No similar papers found.

Authors to Follow