Massive Open-Vocabulary Keyword Spotting

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of current automatic speech recognition systems in accurately recognizing rare domain-specific terms and the poor scalability of conventional open-vocabulary keyword spotting methods to large-scale lexicons. The authors propose an efficient open-vocabulary keyword detection framework that integrates contextual biasing with compressed feature storage, significantly enhancing recognition performance for rare terms without requiring fine-tuning of the underlying speech recognition model. By introducing an innovative feature representation and fusion mechanism, the method reduces memory consumption by up to 128× while enabling real-time detection over large-scale terminology banks for the first time. Moreover, it maintains entity recall rates on unseen languages comparable to those of uncompressed approaches, substantially improving the system’s scalability and practical applicability.

📝 Abstract

Automatic speech recognition systems have been shown to under-perform when it comes to transcribing words rarely seen in the training data, namely specialized terminology. Open-vocabulary keyword spotting, combined with contextual biasing, has been shown to mitigate this issue. However, existing systems can only handle glossaries of a few hundred terms without becoming an infeasible bottleneck. We propose a system that stores features with a memory footprint up to 128 times smaller than a comparable baseline and allows users to process massive databases while remaining open-vocabulary. Without fine-tuning the speech recognition model, our system achieves a comparable entity recall as uncompressed solutions, even in languages not seen during training.

Problem

Research questions and friction points this paper is trying to address.

open-vocabulary keyword spotting

automatic speech recognition

specialized terminology

massive glossaries

contextual biasing

Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary keyword spotting

contextual biasing

feature compression