Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the unnatural inter-sentence silences—up to 9.6 seconds—in conventional real-time game commentary systems, which stem from strictly sequential text generation and speech synthesis. To mitigate this latency bottleneck, the authors propose an end-to-end low-latency commentary architecture that parallelizes large language model–based text generation with speech synthesis and incorporates a multi-candidate sentence pre-caching mechanism to enable immediate voice output at utterance boundaries. The proposed approach reduces the average inter-sentence silence duration to 0.3 seconds and improves temporal alignment with professional commentators by over 40%. A user study involving 120 experienced gamers demonstrates that the system significantly enhances the naturalness of speaking rhythm and overall immersion.

📝 Abstract

We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional pipelines capture frames, generate text, and synthesize speech sequentially for each utterance, and do not request the next generation until speech playback has completed. This strict sequentiality causes long and unnatural silence between utterances. To address this latency bottleneck, our system runs text generation in parallel with speech playback and buffers multiple candidate utterances ahead of time, enabling immediate synthesis at playback boundaries. Experiments on fast-paced game videos show that our parallel design reduces the mean inter-utterance silence from 9.6 seconds to 0.3 seconds compared to sequential baselines. It also improves similarity to professional speaking--silence timing patterns by over 40 %, and a user study with 120 experienced game players confirms significantly improved perceived speaking rhythm. Our demo video is available at: https://youtu.be/pmrRUlvav8M.

Problem

Research questions and friction points this paper is trying to address.

low-latency

real-time audio commentary

inter-utterance silence

sequential pipeline

spoken rhythm

Innovation

Methods, ideas, or system contributions that make the work stand out.

low-latency

real-time audio commentary

parallel text generation