Entropy-Aware Domain-Routed Mixture-of-Experts Speech-LLM Framework: A Case Study of Multi-Domain Child-Adult ASR

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of current large speech models, which excel in adult speech recognition but struggle with children’s speech and unified modeling across age groups and diverse scenarios. The authors propose a novel Mixture-of-Experts (MoE)-based unified automatic speech recognition (ASR) framework that, for the first time, effectively applies large speech models to multi-domain ASR tasks encompassing both children and adults. The approach integrates classifier-driven domain routing (C-DR), a hybrid architecture combining projection layers and LoRA (MoP/MoL), and an entropy-aware routing (EAR) mechanism to dynamically fuse shared experts and mitigate routing uncertainty near domain boundaries. Experimental results demonstrate that the model significantly outperforms baselines on public child speech corpora while maintaining competitive performance on adult speech recognition.

📝 Abstract

While Speech Large Language Models (Speech-LLMs) have achieved strong performance on adult Automatic Speech Recognition (ASR), their effectiveness on child speech remains under-explored, and single models often struggle to handle diverse adult and child age groups simultaneously. This paper proposes a Mixture-of-Experts (MoE) Speech-LLM for unified ASR across adult and child speech spanning diverse environments and age groups. The framework employs a Classifier-based Domain Router (C-DR) with a coarse-to-fine strategy and integrates both a Mixture-of-Projectors (MoP) and a Mixture-of-LoRAs (MoL) to model domain-specific variations. To address routing uncertainty near domain boundaries, an Entropy-Aware Routing (EAR) mechanism is introduced to dynamically incorporate a shared expert. Experiments on public child corpora demonstrate consistent improvements over baselines while preserving adult ASR performance. To our knowledge, this is the first work leveraging Speech-LLMs for unified, multi-domain ASR encompassing both children and adults.

Problem

Research questions and friction points this paper is trying to address.

Speech-LLM

Automatic Speech Recognition

Child Speech

Multi-Domain ASR

Age Groups

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts

Entropy-Aware Routing

Speech-LLM