Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking

📅 2024-09-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

140K/year

🤖 AI Summary

Dialogue State Tracking (DST) in speech-driven task-oriented dialogues is highly vulnerable to Automatic Speech Recognition (ASR) errors—especially misrecognitions of named entities—leading to substantial performance degradation. To address this, we propose a keyword-aware controllable error augmentation method: first, leveraging prompt-based learning to identify critical slot-value positions; second, constructing a phoneme-similarity confusion model to inject semantically plausible and acoustically similar synthetic errors *only* at the identified positions, thereby generating high-quality noisy training data; and third, performing end-to-end fine-tuning to enhance DST robustness. This is the first approach to jointly integrate keyword localization with phoneme-level modeling for controllable, targeted error generation. Empirical results demonstrate significant improvements in DST accuracy across diverse ASR noise types, with particularly pronounced gains under extreme noise conditions—e.g., when ASR word accuracy falls below 80%.

Technology Category

Application Category

📝 Abstract

Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.

Problem

Research questions and friction points this paper is trying to address.

Addresses ASR-induced named entity errors in dialogue state tracking

Improves DST robustness through phonetic error augmentation on entities

Enhances accuracy in noisy speech recognition environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven phonetic error augmentation for DST

Controlled error placement using keyword-highlighted prompts

Generating phonetically similar errors on named entities

🔎 Similar Papers

No similar papers found.