Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures

📅 2025-12-18
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the feasibility and willingness of OpenAI’s GPT-series large language models to translate between Finnish and four endangered, low-resource Uralic languages—Komi-Zyrian, Moksha, Erzya, and Udmurt. Method: Employing a novel rejection-rate analysis based on parallel literary corpora, we conduct the first systematic comparison of reasoning-based (e.g., o1) versus non-reasoning-based (e.g., GPT-4) architectures in machine translation for such languages. Contribution/Results: Reasoning architectures significantly reduce translation refusal rates—by up to 16 percentage points—demonstrating markedly higher attempt propensity and adaptability in low-resource, endangered-language settings. These findings provide critical empirical support for AI-assisted digital archiving and revitalization of endangered languages, while revealing that architectural distinctions—particularly the integration of chain-of-thought reasoning—substantially influence model performance on linguistically under-resourced tasks. The results underscore the importance of architecture-aware evaluation in low-resource NLP and highlight reasoning capabilities as a key determinant of model robustness in minority-language translation.

Technology Category

Application Category

📝 Abstract
The evaluation of Large Language Models (LLMs) for translation tasks has primarily focused on high-resource languages, leaving a significant gap in understanding their performance on low-resource and endangered languages. This study presents a comprehensive comparison of OpenAI's GPT models, specifically examining the differences between reasoning and non-reasoning architectures for translating between Finnish and four low-resource Uralic languages: Komi-Zyrian, Moksha, Erzya, and Udmurt. Using a parallel corpus of literary texts, we evaluate model willingness to attempt translation through refusal rate analysis across different model architectures. Our findings reveal significant performance variations between reasoning and non-reasoning models, with reasoning models showing 16 percentage points lower refusal rates. The results provide valuable insights for researchers and practitioners working with Uralic languages and contribute to the broader understanding of reasoning model capabilities for endangered language preservation.
Problem

Research questions and friction points this paper is trying to address.

Evaluates GPT models for translating endangered Uralic languages
Compares reasoning and non-reasoning architectures' translation performance
Analyzes refusal rates to assess model willingness for translation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating GPT models for endangered Uralic language translation
Comparing reasoning and non-reasoning architectures for translation tasks
Using refusal rate analysis to assess model willingness to translate
Y
Yehor Tereshchenko
Metropolia University of Applied Sciences
Mika Hämäläinen
Mika Hämäläinen
Metropolia University of Applied Sciences
NLPNLGcomputational creativityendangered languagesdigital humanities
S
Svitlana Myroniuk
University of Helsinki