🤖 AI Summary
This study addresses the low recognition accuracy and inefficiency of speech-word suggestions in non-visual text input under concurrent audio environments. We propose a novel dual-word simultaneous playback paradigm. Through perceptual experiments and precise temporal manipulation of speech stimuli, we find that a mere 150-ms inter-word asynchrony achieves 84% word recognition accuracy—statistically indistinguishable from conventional sequential playback (86%)—while improving response speed by 32%. We further construct a controllable lexicon tailored for predictive keyboards and validate the paradigm’s effectiveness and robustness via rigorous human-computer interaction evaluations. This work is the first to empirically demonstrate the critical role of auditory temporal tolerance in multi-word speech recognition, revealing that slight asynchrony enhances processing efficiency without compromising accuracy. It provides both a theoretically grounded and practically deployable solution for accessible auditory-assisted input systems.
📝 Abstract
We explore a method for presenting word suggestions for non-visual text input using simultaneous voices. We conduct two perceptual studies and investigate the impact of different presentations of voices on a user's ability to detect which voice, if any, spoke their desired word. Our sets of words simulated the word suggestions of a predictive keyboard during real-world text input. We find that when voices are simultaneous, user accuracy decreases significantly with each added word suggestion. However, adding a slight 0.15 s delay between the start of each subsequent word allows two simultaneous words to be presented with no significant decrease in accuracy compared to presenting two words sequentially (84% simultaneous versus 86% sequential). This allows two word suggestions to be presented to the user 32% faster than sequential playback without decreasing accuracy.