Revisiting Lexicon Evaluation in Unsupervised Word Discovery

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses a critical limitation in existing evaluation metrics for unsupervised word discovery: their inherent bias toward large clusters and neglect of the natural dispersion of true word types across multiple clusters, which distorts assessment outcomes. To remedy this, the work proposes two novel clustering-theoretic metrics. First, a cluster-size-weighted consistency measure mitigates size-related bias; second, an inverse metric explicitly captures the cross-cluster dispersion of ground-truth words. Experimental validation using normalized edit distance on both synthetic and real speech data demonstrates that the combined use of these metrics substantially improves correlation with the true lexical distribution and effectively overcomes the systematic biases inherent in conventional evaluation approaches.

📝 Abstract

Building a lexicon from discovered word-like units is a central goal in zero-resource speech processing. But do our evaluations provide a trustworthy indication of lexicon quality? A common metric, normalized edit distance, averages the phoneme edit distances between discovered units in each cluster. We show that this metric has an inherent bias toward the quality of large clusters, inhibiting fair evaluation. Moreover, it ignores how well true classes are distributed across clusters. Based on established theory in clustering literature, we propose two metrics that address these shortcomings: a modified metric that weighs cluster size when assessing within-cluster consistency, and an inverse metric that assesses how true words are spread across clusters. Through experiments on synthetic and real-world lexicons, we demonstrate that combined, these metrics are: (1) more closely correlated with how similar a lexicon is to the ground-truth distribution, and (2) more robust to biases that skew lexicon evaluations.

Problem

Research questions and friction points this paper is trying to address.

lexicon evaluation

unsupervised word discovery

normalized edit distance

clustering bias

zero-resource speech processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

lexicon evaluation

unsupervised word discovery

clustering metrics