Adapting Whisper for Lightweight and Efficient Automatic Speech Recognition of Children for On-device Edge Applications

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy concerns and computational constraints in deploying automatic speech recognition (ASR) for children’s speech on edge devices, this work proposes the first lightweight, child-optimized, on-device ASR solution. Starting from the Whisper tiny.en model, we introduce a domain-specific pipeline comprising child speech data filtering, targeted fine-tuning on pediatric corpora, and low-rank model compression. This approach preserves recognition accuracy while substantially reducing model complexity. Evaluated on the MyST corpus, our method achieves a word error rate of 11.8% (after data filtering), reduces computational cost by approximately 2 GFLOPS, accelerates inference by 1.26×, and attains a stable real-time factor of 0.23–0.41 on Raspberry Pi hardware. To the best of our knowledge, this is the first end-to-end edge ASR system for children’s speech that simultaneously delivers high accuracy, low latency, and strong on-device privacy protection.

Technology Category

Application Category

📝 Abstract
Reliability on cloud providers for ASR inference to support child-centered voice-based applications is becoming challenging due to regulatory and privacy challenges. Motivated by a privacy-preserving design, this study aims to develop a lightweight & efficient Whisper ASR system capable of running on a Raspberry Pi. Upon evaluation of the MyST corpus and by examining various filtering strategies to fine-tune the `tiny.en' model, a Word Error Rate (WER) of 15.9% was achieved (11.8% filtered). A low-rank compression reduces the encoder size by 0.51M with 1.26x faster inference in GPU, with 11% relative WER increase. During inference on Pi, the compressed version required ~2 GFLOPS fewer computations. The RTF for both the models ranged between [0.23-0.41] for various input audio durations. Analyzing the RAM usage and CPU temperature showed that the PI was capable of handling both the tiny models, however it was noticed that small models initiated additional overhead/thermal throttling.
Problem

Research questions and friction points this paper is trying to address.

Develop lightweight Whisper ASR for children's speech on Raspberry Pi
Address privacy concerns in cloud-based ASR for child-centered applications
Optimize model efficiency with low-rank compression and filtering strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Whisper ASR for child speech recognition
Low-rank compression reduces encoder size
Optimized for Raspberry Pi with efficient inference
🔎 Similar Papers
No similar papers found.
Satwik Dutta
Satwik Dutta
Ph.D. Candidate, Center for Robust Speech Systems, University of Texas at Dallas
Speech RecognitionSpeech DisordersSpeech ProcessingChild Speech
S
Shruthigna Chandupatla
Center for Robust Speech Systems, The University of Texas at Dallas, USA
J
John H.L. Hansen
Center for Robust Speech Systems, The University of Texas at Dallas, USA