EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Low speech recognition accuracy for deaf and hard-of-hearing (DHH) individuals, coupled with the high user burden of existing personalized ASR adaptation—requiring extensive pre-recorded speech data—remains a critical challenge. This paper proposes a real-time collaborative ASR adaptation framework that shifts adaptation responsibility from unilateral user effort to bidirectional human–system cooperation. During natural dialogue, hearing participants provide real-time corrections, which dynamically generate personalized pronunciation prompts and trigger lightweight model fine-tuning. Evaluated with 12 DHH and 6 hearing participants, the system achieves significant word error rate reduction using only ~5 minutes of authentic conversational audio. To our knowledge, this is the first approach enabling low-overhead, highly usable, and conversationally embedded real-time collaborative optimization. It uniquely balances intuitiveness, minimal cognitive load, and practical deployability.

Technology Category

Application Category

📝 Abstract

Automatic Speech Recognition (ASR) systems often fail to accurately transcribe speech from Deaf and Hard of Hearing (DHH) individuals, especially during real-time conversations. Existing personalization approaches typically require extensive pre-recorded data and place the burden of adaptation on the DHH speaker. We present EvolveCaptions, a real-time, collaborative ASR adaptation system that supports in-situ personalization with minimal effort. Hearing participants correct ASR errors during live conversations. Based on these corrections, the system generates short, phonetically targeted prompts for the DHH speaker to record, which are then used to fine-tune the ASR model. In a study with 12 DHH and six hearing participants, EvolveCaptions reduced Word Error Rate (WER) across all DHH users within one hour of use, using only five minutes of recording time on average. Participants described the system as intuitive, low-effort, and well-integrated into communication. These findings demonstrate the promise of collaborative, real-time ASR adaptation for more equitable communication.

Problem

Research questions and friction points this paper is trying to address.

Real-time ASR fails to accurately transcribe DHH individuals' speech

Existing personalization requires extensive data and burdens DHH speakers

Collaborative system enables in-situ ASR adaptation with minimal effort

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time collaborative ASR adaptation system

Generates phonetically targeted prompts for recording

Fine-tunes ASR model using minimal recording time

🔎 Similar Papers

No similar papers found.

Authors to Follow