ChatGPT as a Translation Engine: A Case Study on Japanese-English

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study systematically evaluates ChatGPT (v3.5 and v4) on Japanese–English translation, benchmarking against leading commercial engines (e.g., Google Translate, DeepL) and distinguishing document-level versus sentence-level translation performance. It further investigates the impact of simple versus context-augmented prompting on output quality. Evaluation employs both automatic metrics (BLEU, COMET) and human annotation via the MQM framework. Results show: (1) Document-level context substantially improves coherence and coreference consistency; (2) GPT-3.5 prioritizes accuracy, whereas GPT-4 excels in fluency and stylistic appropriateness; (3) With optimized context-aware prompting, GPT-4 achieves parity with state-of-the-art commercial systems. This work provides the first empirical validation—specifically for Japanese–English—of LLMs’ document-level translation advantages and uncovers critical trade-offs between prompting strategies and model versions in translation quality.

Technology Category

Application Category

📝 Abstract

This study investigates ChatGPT for Japanese-English translation, exploring simple and enhanced prompts and comparing against commercially available translation engines. Performing both automatic and MQM-based human evaluations, we found that document-level translation outperforms sentence-level translation for ChatGPT. On the other hand, we were not able to determine if enhanced prompts performed better than simple prompts in our experiments. We also discovered that ChatGPT-3.5 was preferred by automatic evaluation, but a tradeoff exists between accuracy (ChatGPT-3.5) and fluency (ChatGPT-4). Lastly, ChatGPT yields competitive results against two widely-known translation systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ChatGPT's performance for Japanese-English document translation

Comparing simple versus enhanced prompts for translation quality

Assessing ChatGPT against commercial translation systems using MQM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Document-level translation outperforms sentence-level translation

Enhanced prompts versus simple prompts comparison inconclusive

ChatGPT yields competitive results against commercial systems

🔎 Similar Papers

No similar papers found.

Authors to Follow