Massive Open-Vocabulary Keyword Spotting

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current automatic speech recognition systems in accurately recognizing rare domain-specific terms and the poor scalability of conventional open-vocabulary keyword spotting methods to large-scale lexicons. The authors propose an efficient open-vocabulary keyword detection framework that integrates contextual biasing with compressed feature storage, significantly enhancing recognition performance for rare terms without requiring fine-tuning of the underlying speech recognition model. By introducing an innovative feature representation and fusion mechanism, the method reduces memory consumption by up to 128× while enabling real-time detection over large-scale terminology banks for the first time. Moreover, it maintains entity recall rates on unseen languages comparable to those of uncompressed approaches, substantially improving the system’s scalability and practical applicability.
📝 Abstract
Automatic speech recognition systems have been shown to under-perform when it comes to transcribing words rarely seen in the training data, namely specialized terminology. Open-vocabulary keyword spotting, combined with contextual biasing, has been shown to mitigate this issue. However, existing systems can only handle glossaries of a few hundred terms without becoming an infeasible bottleneck. We propose a system that stores features with a memory footprint up to 128 times smaller than a comparable baseline and allows users to process massive databases while remaining open-vocabulary. Without fine-tuning the speech recognition model, our system achieves a comparable entity recall as uncompressed solutions, even in languages not seen during training.
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary keyword spotting
automatic speech recognition
specialized terminology
massive glossaries
contextual biasing
Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary keyword spotting
contextual biasing
feature compression
massive glossary
zero-shot language transfer
🔎 Similar Papers
No similar papers found.