O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the reliance on hyperparameter tuning in unsupervised clustering for speaker diarization and the high computational cost of existing online end-to-end approaches, this paper proposes the first hyperparameter-free end-to-end online neural clustering framework. The system processes streaming speech inputs via an RNN-based chunk concatenation mechanism to model inter-chunk temporal dependencies, and introduces an adaptive centroid refinement decoder that replaces conventional unsupervised clustering modules—enabling real-time, differentiable online clustering without overlapping chunks. Built upon the EEND-EDA architecture, it performs chunk-wise online inference, substantially reducing computational complexity. Evaluated on the CallHome two-speaker dataset, the method achieves state-of-the-art performance while attaining the optimal trade-off between diarization error rate (DER) and inference efficiency.

Technology Category

Application Category

📝 Abstract
We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study. Our system provides key advantages over existing methods: a hyperparameter-free solution compared to unsupervised clustering approaches, and a more efficient alternative to current online end-to-end methods, which are computationally costly. We demonstrate that O-EENC-SD is competitive with the state of the art in the two-speaker conversational telephone speech domain, as tested on the CallHome dataset. Our results show that O-EENC-SD provides a great trade-off between DER and complexity, even when working on independent chunks with no overlap, making the system extremely efficient.
Problem

Research questions and friction points this paper is trying to address.

Develops an online end-to-end neural clustering system for speaker diarization
Introduces a hyperparameter-free solution to replace unsupervised clustering methods
Provides an efficient alternative to computationally costly online end-to-end approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

RNN-based stitching mechanism for online prediction
Novel centroid refinement decoder for speaker diarization
Hyperparameter-free and efficient online end-to-end solution
🔎 Similar Papers
No similar papers found.