Targum -- A Multilingual New Testament Translation Corpus

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical gap in existing multilingual corpora, which prioritize broad language coverage at the expense of historical depth—particularly regarding Bible translations in European languages—thereby limiting nuanced investigations into translation evolution. To overcome this, we construct a novel multilingual corpus comprising 657 New Testament translations across 352 distinct versions, with focused representation of English, French, Italian, Polish, and Spanish, achieving unprecedented textual depth and fine-grained resolution. By integrating data from twelve online biblical libraries and one existing corpus, and manually annotating each translation with work identifiers, version labels, and revision years, we enable precise alignment and deduplication. Crucially, our corpus supports user-defined notions of “uniqueness,” facilitating flexible, multi-granular quantitative analyses—from translation families to cross-version comparisons—and establishes a new benchmark for research in translation history.

Technology Category

Application Category

📝 Abstract
Many European languages possess rich biblical translation histories, yet existing corpora - in prioritizing linguistic breadth - often fail to capture this depth. To address this gap, we introduce a multilingual corpus of 657 New Testament translations, of which 352 are unique, with unprecedented depth in five languages: English (208 unique versions from 396 total), French (41 from 78), Italian (18 from 33), Polish (30 from 48), and Spanish (55 from 102). Aggregated from 12 online biblical libraries and one preexisting corpus, each translation is manually annotated with metadata that maps the text to a standardized identifier for the work, its specific edition, and its year of revision. This canonicalization empowers researchers to define"uniqueness"for their own needs: they can perform micro-level analyses on translation families, such as the KJV lineage, or conduct macro-level studies by deduplicating closely related texts. By providing the first resource designed for such flexible, multilevel analysis, our corpus establishes a new benchmark for the quantitative study of translation history.
Problem

Research questions and friction points this paper is trying to address.

multilingual corpus
New Testament translation
translation history
biblical translation
corpus depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual corpus
translation history
manual annotation
canonicalization
deduplication
🔎 Similar Papers
No similar papers found.