O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the reliance on hyperparameter tuning in unsupervised clustering for speaker diarization and the high computational cost of existing online end-to-end approaches, this paper proposes the first hyperparameter-free end-to-end online neural clustering framework. The system processes streaming speech inputs via an RNN-based chunk concatenation mechanism to model inter-chunk temporal dependencies, and introduces an adaptive centroid refinement decoder that replaces conventional unsupervised clustering modules—enabling real-time, differentiable online clustering without overlapping chunks. Built upon the EEND-EDA architecture, it performs chunk-wise online inference, substantially reducing computational complexity. Evaluated on the CallHome two-speaker dataset, the method achieves state-of-the-art performance while attaining the optimal trade-off between diarization error rate (DER) and inference efficiency.

Technology Category

Application Category

📝 Abstract

We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study. Our system provides key advantages over existing methods: a hyperparameter-free solution compared to unsupervised clustering approaches, and a more efficient alternative to current online end-to-end methods, which are computationally costly. We demonstrate that O-EENC-SD is competitive with the state of the art in the two-speaker conversational telephone speech domain, as tested on the CallHome dataset. Our results show that O-EENC-SD provides a great trade-off between DER and complexity, even when working on independent chunks with no overlap, making the system extremely efficient.

Problem

Research questions and friction points this paper is trying to address.

Develops an online end-to-end neural clustering system for speaker diarization

Introduces a hyperparameter-free solution to replace unsupervised clustering methods

Provides an efficient alternative to computationally costly online end-to-end approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

RNN-based stitching mechanism for online prediction

Novel centroid refinement decoder for speaker diarization

Hyperparameter-free and efficient online end-to-end solution

🔎 Similar Papers

No similar papers found.