๐ค AI Summary
This work addresses the performance degradation of speaker verification systems due to age-related vocal changes, a challenge exacerbated by the absence of large-scale longitudinal speech datasets. To bridge this gap, the authors introduce VoxKnessetโthe first large-scale Hebrew longitudinal speech corpus, comprising approximately 2,300 hours of parliamentary speeches from 393 speakers recorded over 15 years, accompanied by aligned transcripts and official demographic metadata. Benchmark evaluations using WavLM-Large, ECAPA-TDNN, and Wav2Vec2-XLSR-1B reveal that speaker verification equal error rates (EER) increase from 2.15% to 4.58% over the 15-year span. Furthermore, age regressors trained longitudinally successfully capture individual vocal aging patterns, whereas cross-sectional models fail to do so. VoxKnesset thus provides a critical resource for advancing research in longitudinal speaker verification and age modeling.
๐ Abstract
Speech processing systems face a fundamental challenge: the human voice changes with age, yet few datasets support rigorous longitudinal evaluation. We introduce VoxKnesset, an open-access dataset of ~2,300 hours of Hebrew parliamentary speech spanning 2009-2025, comprising 393 speakers with recording spans of up to 15 years. Each segment includes aligned transcripts and verified demographic metadata from official parliamentary records. We benchmark modern speech embeddings (WavLM-Large, ECAPA-TDNN, Wav2Vec2-XLSR-1B) on age prediction and speaker verification under longitudinal conditions. Speaker verification EER rises from 2.15\% to 4.58\% over 15 years for the strongest model, and cross-sectionally trained age regressors fail to capture within-speaker aging, while longitudinally trained models recover a meaningful temporal signal. We publicly release the dataset and pipeline to support aging-robust speech systems and Hebrew speech processing.