Uni-ASR: Unified LLM-Based Architecture for Non-Streaming and Streaming Automatic Speech Recognition

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work proposes Uni-ASR, a unified large language model (LLM)-driven end-to-end automatic speech recognition (ASR) architecture that simultaneously supports both non-streaming and streaming modes within a single model without architectural modifications. Existing LLM-based ASR systems often require separate designs for each mode and struggle with deployment in low-latency streaming scenarios. Uni-ASR addresses this limitation through joint training and introduces two key innovations: context-aware training and a zero-latency-cost cooperative fallback decoding mechanism, which together significantly enhance streaming recognition accuracy. Experimental results demonstrate that Uni-ASR achieves state-of-the-art performance in non-streaming settings and consistently outperforms existing methods across various streaming conditions under diverse latency constraints.

Technology Category

Application Category

📝 Abstract

Although the deep integration of the Automatic Speech Recognition (ASR) system with Large Language Models (LLMs) has significantly improved accuracy, the deployment of such systems in low-latency streaming scenarios remains challenging. In this paper, we propose Uni-ASR, a unified framework based on LLMs that integrates both non-streaming and streaming speech recognition capabilities. We propose a joint training paradigm that enables the system to seamlessly transition between two recognition modes without any architectural modifications. Furthermore, we introduce a context-aware training paradigm and a co-designed fallback decoding strategy, which can enhance streaming recognition accuracy without introducing additional latency. The experimental results demonstrate that Uni-ASR not only achieves competitive performance within non-streaming mode, but also demonstrates strong effectiveness in streaming scenarios under diverse latency constraints.

Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition

Large Language Models

Streaming ASR

Non-Streaming ASR

Low-Latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified ASR

Large Language Models

Streaming Speech Recognition