🤖 AI Summary
This work addresses the prohibitive memory consumption—approximately 320 GB—of conventional WFST-based CTC decoders in speech neuroprosthetics, which severely limits their practical deployment. To overcome this limitation, the authors propose LightBeam, a novel non-WFST CTC decoder that, for the first time, integrates large language models (LLMs) into beam search through late fusion, replacing traditional large N-gram language models and eliminating reliance on WFST structures. This approach maintains high decoding accuracy while reducing memory usage to approximately 10 GB, achieving state-of-the-art performance on the Brain-to-Text ’24 and ’25 benchmarks.
📝 Abstract
A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires ${\sim}$320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only ${\sim}$10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.