Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking

📅 2024-09-10
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Dialogue State Tracking (DST) in speech-driven task-oriented dialogues is highly vulnerable to Automatic Speech Recognition (ASR) errors—especially misrecognitions of named entities—leading to substantial performance degradation. To address this, we propose a keyword-aware controllable error augmentation method: first, leveraging prompt-based learning to identify critical slot-value positions; second, constructing a phoneme-similarity confusion model to inject semantically plausible and acoustically similar synthetic errors *only* at the identified positions, thereby generating high-quality noisy training data; and third, performing end-to-end fine-tuning to enhance DST robustness. This is the first approach to jointly integrate keyword localization with phoneme-level modeling for controllable, targeted error generation. Empirical results demonstrate significant improvements in DST accuracy across diverse ASR noise types, with particularly pronounced gains under extreme noise conditions—e.g., when ASR word accuracy falls below 80%.

Technology Category

Application Category

📝 Abstract
Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.
Problem

Research questions and friction points this paper is trying to address.

Addresses ASR-induced named entity errors in dialogue state tracking
Improves DST robustness through phonetic error augmentation on entities
Enhances accuracy in noisy speech recognition environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven phonetic error augmentation for DST
Controlled error placement using keyword-highlighted prompts
Generating phonetically similar errors on named entities
🔎 Similar Papers
No similar papers found.