🤖 AI Summary
Traditional speech recognition evaluation metrics, such as word error rate (WER), struggle to capture the systematic impact of environmental noise on the safety of clinical documentation. This work proposes a paired acoustic stress testing methodology that injects diverse noise types into identical clinical dialogues while holding the downstream model fixed, thereby causally isolating the effect of noise on clinical reasoning. The study reveals a significant disconnect between speech fidelity and clinical safety: a mere 0.71% increase in WER nearly doubles the rate of unsafe model outputs. Building on this insight, the authors introduce a lightweight, model-agnostic safety mitigation strategy that effectively curbs noise-induced degradation in clinical safety without requiring model fine-tuning.
📝 Abstract
Ambient clinical scribes increasingly combine Automatic Speech Recognition with Large Language Models to automate documentation. However, traditional metrics like Word Error Rate mask systemic safety degradation. We present a paired acoustic stress test to isolate the causal impact of noise on clinical reasoning. For the same dialogues, we inject diverse noise types while keeping the downstream model configuration frozen. Crucially, we uncover a dangerous disconnect between signal fidelity and clinical safety. Stationary ambient noise increased the Word Error Rate by a negligible 0.71 percentage points yet nearly doubled the rate of unsafe outputs. Our analysis reveals that minor acoustic perturbations can invert clinical meaning without substantially inflating error rates. Furthermore, we demonstrate a lightweight mitigation strategy that mitigates safety degradation under noisy conditions without requiring model fine tuning.