Preservation of Language Understanding Capabilities in Speech-aware Large Language Models

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates whether large language models’ linguistic understanding degrades under speech input. To address this, we introduce C3T—the first dedicated benchmark integrating text-understanding tasks, controllable voice cloning, and a cross-modal evaluation framework—to systematically quantify performance degradation and speaker-invariance under speech input. C3T decouples phonetic attributes (e.g., timbre, speaking rate, accent) from semantic content, enabling fine-grained assessment of cross-modal robustness and fairness. Experimental results reveal substantial comprehension deterioration across mainstream speech-language models, with performance significantly influenced by speaker attributes—including gender, age, and accent—exposing latent biases and fragility. C3T thus provides a reproducible, interpretable, and attribute-aware evaluation standard for speech–language joint modeling.

Technology Category

Application Category

📝 Abstract

The paper presents C3T (Cross-modal Capabilities Conservation Test), a new benchmark for assessing the performance of speech-aware large language models. The benchmark utilizes textual tasks and a voice cloning text-to-speech model to quantify the extent to which language understanding capabilities are preserved when the model is accessed via speech input. C3T quantifies the fairness of the model for different categories of speakers and its robustness across text and speech modalities.

Problem

Research questions and friction points this paper is trying to address.

Assessing language understanding preservation in speech-aware LLMs

Quantifying model fairness across different speaker categories

Evaluating robustness across text and speech modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal benchmark for speech-aware models

Voice cloning TTS to test speech input

Quantifies fairness and robustness across modalities

🔎 Similar Papers

No similar papers found.

Authors to Follow