Large Language Model Data Generation for Enhanced Intent Recognition in German Speech

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of annotated speech data for intent recognition in German—particularly in elderly spoken language—and the resulting model robustness limitations, this paper proposes a synthetic data augmentation framework for low-resource settings. First, Whisper is fine-tuned to improve automatic speech recognition (ASR) accuracy on elderly German speech. Second, a domain-specific small language model, LeoLM, generates high-quality, semantically coherent German intent labels—outperforming ChatGPT and Llama-3 in relevance and fidelity. Third, a Transformer-based intent classifier is trained on the synthetic labels and validated via text-to-speech (TTS)-synthesized utterances to assess cross-style generalization. Experiments demonstrate substantial gains in classification accuracy and markedly improved robustness across diverse speaking styles, regional accents, and out-of-vocabulary terms. The approach effectively mitigates the bottleneck of scarce real-world annotations and validates the practical utility of lightweight generative AI for low-resource spoken language understanding.

Technology Category

Application Category

📝 Abstract
Intent recognition (IR) for speech commands is essential for artificial intelligence (AI) assistant systems; however, most existing approaches are limited to short commands and are predominantly developed for English. This paper addresses these limitations by focusing on IR from speech by elderly German speakers. We propose a novel approach that combines an adapted Whisper ASR model, fine-tuned on elderly German speech (SVC-de), with Transformer-based language models trained on synthetic text datasets generated by three well-known large language models (LLMs): LeoLM, Llama3, and ChatGPT. To evaluate the robustness of our approach, we generate synthetic speech with a text-to-speech model and conduct extensive cross-dataset testing. Our results show that synthetic LLM-generated data significantly boosts classification performance and robustness to different speaking styles and unseen vocabulary. Notably, we find that LeoLM, a smaller, domain-specific 13B LLM, surpasses the much larger ChatGPT (175B) in dataset quality for German intent recognition. Our approach demonstrates that generative AI can effectively bridge data gaps in low-resource domains. We provide detailed documentation of our data generation and training process to ensure transparency and reproducibility.
Problem

Research questions and friction points this paper is trying to address.

Enhancing German speech intent recognition for elderly speakers
Overcoming data scarcity with LLM-generated synthetic datasets
Comparing performance of domain-specific vs. general-purpose LLMs for German IR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted Whisper ASR model for German speech
Transformer-based LMs trained on LLM-generated data
Synthetic speech for cross-dataset robustness testing
🔎 Similar Papers
No similar papers found.
T
Theresa Pekarek Rosin
University of Hamburg - Knowledge Technology
Burak Can Kaplan
Burak Can Kaplan
University of Hamburg
Artificial Intelligence
S
Stefan Wermter
University of Hamburg - Knowledge Technology