SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the language diarization challenge in multilingual mixed speech—characterized by frequent language switches and scarce labeled data. We propose the first end-to-end neural framework capable of unconstrained language span detection. Methodologically, we introduce a novel architecture integrating learnable query mechanisms with a multilingual-aware network, pretrained on large-scale synthetic code-switched speech data and enhanced via self-supervised learning strategies. This design significantly improves robustness and generalization to ambiguous language boundaries, low-resource languages, and cross-lingual transitions in real-world scenarios. Evaluated on multiple standard benchmarks, our approach achieves state-of-the-art performance, outperforming prior best methods by 23%–52% in diarization error rate.

Technology Category

Application Category

📝 Abstract
In this paper, we present a neural spoken language diarization model that supports an unconstrained span of languages within a single framework. Our approach integrates a learnable query-based architecture grounded in multilingual awareness, with large-scale pretraining on simulated code-switching data. By jointly leveraging these two components, our method overcomes the limitations of conventional approaches in data scarcity and architecture optimization, and generalizes effectively to real-world multilingual settings across diverse environments. Experimental results demonstrate that our approach achieves state-of-the-art performance on several language diarization benchmarks, with a relative performance improvement of 23% to 52% over previous methods. We believe that this work not only advances research in language diarization but also establishes a foundational framework for code-switching speech technologies.
Problem

Research questions and friction points this paper is trying to address.

Developing scalable language diarization for unconstrained multilingual spans
Overcoming data scarcity via simulated code-switching pretraining
Creating generalizable architecture for real-world multilingual environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses simulated code-switching data for large-scale pretraining
Integrates learnable query-based architecture with multilingual awareness
Supports unconstrained language spans in a single framework
🔎 Similar Papers
No similar papers found.
S
Sangmin Lee
Dept. of Electrical & Electronic Engineering, Yonsei University, Seoul, South Korea
W
Woongjib Choi
Dept. of Electrical & Electronic Engineering, Yonsei University, Seoul, South Korea
J
Jihyun Kim
Dept. of Electrical & Electronic Engineering, Yonsei University, Seoul, South Korea
Hong-Goo Kang
Hong-Goo Kang
Yonsei University