IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of interference from competing speakers in full-duplex spoken dialogue systems operating in real-world acoustic environments, which often leads to erroneous user speech encoding, turn-taking confusion, and degraded response quality. To mitigate this issue, the authors propose a lightweight, streaming-compatible adaptive fusion module that dynamically rescales frame-level user representations through a reliability gating mechanism derived from embeddings of both the target speaker and the user audio. Built upon an end-to-end dual-channel architecture, the method integrates embedding-space reliability prediction with frame-level gating to achieve interference-aware adaptive fusion. Experimental results on the MS-MARCO and InstructS2S-200K datasets demonstrate significant improvements in response quality and full-duplex interaction stability under interfering conditions.
📝 Abstract
Full-duplex spoken dialogue models allow voice agents to listen and speak concurrently, enabling natural interaction with real-time overlap. However, end-to-end dual-channel models that jointly encode user and agent streams may degrade in realistic acoustic environments: interfering speakers leaking into the user microphone can be encoded as part of the user query, corrupting the LLM's conditioning and causing unstable turn-taking and reduced response quality. We propose Interference-Resilient Adaptive Fusion (IRAF), a lightweight, streaming-compatible module that modulates the contribution of user audio to the LLM frame by frame. IRAF predicts a scalar reliability gate from target-speaker and user audio embeddings and rescales user representations before fusion with agent embeddings. Experiments on MS-MARCO and InstructS2S-200K show consistent gains in response quality and full-duplex interaction under interfering-speaker conditions.
Problem

Research questions and friction points this paper is trying to address.

full-duplex spoken dialogue
interfering speakers
noise-robust
end-to-end dialogue systems
acoustic interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interference-Resilient
Adaptive Fusion
Full-Duplex Spoken Dialogue
Noise-Robust
Streaming ASR
🔎 Similar Papers
No similar papers found.