Unstable Grounds for Beautiful Trees? Testing the Robustness of Concept Translations in the Compilation of Multilingual Wordlists

📅 2025-03-01

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study addresses the robustness of concept translation in multilingual lexical databases, revealing its substantial impact on linguistic phylogenetic inference—including language family classification and divergence time estimation. Using cross-corpus comparison, phonemic normalization, and translation consistency statistics, we systematically evaluate ten independently compiled multilingual wordlists covering nine language families. We quantify, for the first time, the extent of translation variation: only 83% of concepts exhibit lexical-form consistency across sources, and merely 23% achieve full phonemic alignment. Building upon this, we propose the first meta-evaluation framework for lexical database quality, demonstrating that translation inconsistency constitutes a previously overlooked critical source of error in phylogenetic analysis. Our work establishes a reproducible paradigm for assessing data robustness in computational linguistics and provides concrete pathways for improving lexical resource curation and downstream evolutionary modeling.

Technology Category

Application Category

📝 Abstract

Multilingual wordlists play a crucial role in comparative linguistics. While many studies have been carried out to test the power of computational methods for language subgrouping or divergence time estimation, few studies have put the data upon which these studies are based to a rigorous test. Here, we conduct a first experiment that tests the robustness of concept translation as an integral part of the compilation of multilingual wordlists. Investigating the variation in concept translations in independently compiled wordlists from 10 dataset pairs covering 9 different language families, we find that on average, only 83% of all translations yield the same word form, while identical forms in terms of phonetic transcriptions can only be found in 23% of all cases. Our findings can prove important when trying to assess the uncertainty of phylogenetic studies and the conclusions derived from them.

Problem

Research questions and friction points this paper is trying to address.

Tests robustness of concept translations in multilingual wordlists.

Investigates variation in translations across 10 dataset pairs.

Assesses uncertainty in phylogenetic studies and their conclusions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tests robustness of concept translations

Analyzes multilingual wordlist variations

Assesses uncertainty in phylogenetic studies

🔎 Similar Papers

No similar papers found.