ChatGPT as a Translation Engine: A Case Study on Japanese-English

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates ChatGPT (v3.5 and v4) on Japanese–English translation, benchmarking against leading commercial engines (e.g., Google Translate, DeepL) and distinguishing document-level versus sentence-level translation performance. It further investigates the impact of simple versus context-augmented prompting on output quality. Evaluation employs both automatic metrics (BLEU, COMET) and human annotation via the MQM framework. Results show: (1) Document-level context substantially improves coherence and coreference consistency; (2) GPT-3.5 prioritizes accuracy, whereas GPT-4 excels in fluency and stylistic appropriateness; (3) With optimized context-aware prompting, GPT-4 achieves parity with state-of-the-art commercial systems. This work provides the first empirical validation—specifically for Japanese–English—of LLMs’ document-level translation advantages and uncovers critical trade-offs between prompting strategies and model versions in translation quality.

Technology Category

Application Category

📝 Abstract
This study investigates ChatGPT for Japanese-English translation, exploring simple and enhanced prompts and comparing against commercially available translation engines. Performing both automatic and MQM-based human evaluations, we found that document-level translation outperforms sentence-level translation for ChatGPT. On the other hand, we were not able to determine if enhanced prompts performed better than simple prompts in our experiments. We also discovered that ChatGPT-3.5 was preferred by automatic evaluation, but a tradeoff exists between accuracy (ChatGPT-3.5) and fluency (ChatGPT-4). Lastly, ChatGPT yields competitive results against two widely-known translation systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating ChatGPT's performance for Japanese-English document translation
Comparing simple versus enhanced prompts for translation quality
Assessing ChatGPT against commercial translation systems using MQM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Document-level translation outperforms sentence-level translation
Enhanced prompts versus simple prompts comparison inconclusive
ChatGPT yields competitive results against commercial systems
🔎 Similar Papers
No similar papers found.