🤖 AI Summary
This study addresses semantic ambiguity in numeral strings on undeciphered Proto-Elamite (PE) accounting tablets—arising from the coexistence of multiple counting systems—with the goal of accurately mapping each numeral sign to its corresponding modern Arabic value. Methodologically, we propose a document-structure-constrained disambiguation framework, design a rule-guided bootstrapped classifier to enhance robustness, and construct the first benchmark dataset for PE numeral decipherment evaluation. Key contributions include: empirically validating and correcting longstanding scholarly assumptions about the PE numeral system; discovering, for the first time, a statistically significant correlation between tablet content type and numeral magnitude; and achieving substantially higher numerical reconstruction accuracy than baseline approaches. These results establish a verifiable methodological foundation and provide critical data support for the systematic decipherment of Proto-Elamite economic texts.
📝 Abstract
A numeration system encodes abstract numeric quantities as concrete strings of written characters. The numeration systems used by modern scripts tend to be precise and unambiguous, but this was not so for the ancient and partially-deciphered proto-Elamite (PE) script, where written numerals can have up to four distinct readings depending on the system that is used to read them. We consider the task of disambiguating between these readings in order to determine the values of the numeric quantities recorded in this corpus. We contribute an automated conversion from PE notation to modern Hindu-Arabic notation, as well as two disambiguation techniques based on structural properties of the original documents and classifiers learned with the bootstrapping algorithm. We also contribute a test set for evaluating disambiguation techniques, as well as a novel approach to cautious rule selection for bootstrapped classifiers. Our analysis confirms existing intuitions about this script and reveals previously-unknown correlations between tablet content and numeral magnitude. This work is crucial to understanding and deciphering PE, as the corpus is heavily accounting-focused and contains many more numeric tokens than tokens of text.