🤖 AI Summary
This work addresses the challenge in task-oriented dialogue systems where user utterances often express multiple intents, a scenario poorly handled by existing approaches that typically assume single-intent modeling and struggle to jointly optimize multi-intent detection and slot filling. To tackle variable numbers of intents and inter-task interference, the authors propose a generative framework featuring an “attention-over-attention” mechanism within the decoder, which incorporates inductive bias to enable synergistic modeling of both tasks. Leveraging BERT’s next sentence prediction capability, they construct two novel multi-intent spoken language understanding (SLU) datasets and adopt a multi-task joint training strategy. Experimental results demonstrate that the proposed model achieves state-of-the-art performance on MixATIS, MixSNIPS, and the newly introduced datasets, confirming its effectiveness in multi-intent semantic understanding.
📝 Abstract
In task-oriented dialogue systems, spoken language understanding (SLU) is a critical component, which consists of two sub-tasks, intent detection and slot filling. Most existing methods focus on the single-intent SLU, where each utterance only has one intent. However, in real-world scenarios users usually express multiple intents in an utterance, which poses a challenge for existing dialogue systems and datasets. In this paper, we propose a generative framework to simultaneously address multiple intent detection and slot filling. In particular, an attention-over-attention decoder is proposed to handle the variable number of intents and the interference between the two sub-tasks by incorporating an inductive bias into the process of multi-task learning. Besides, we construct two new multi-intent SLU datasets based on single-intent utterances by taking advantage of the next sentence prediction (NSP) head of the BERT model. Experimental results demonstrate that our proposed attention-over-attention generative model achieves state-of-the-art performance on two public datasets, MixATIS and MixSNIPS, and our constructed datasets.