Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data

📅 2026-05-23

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study examines how translation labor is systematically incorporated into AI training data without acknowledgment or compensation, engendering structural injustices at the intersection of copyright law and political economy. Through a comparative analysis of legal frameworks in Japan, the European Union, and the United States, alongside an investigation into the supply chains of translation memories and parallel corpora—and scrutiny of data practices in both open and proprietary large language models—the paper introduces two original concepts: “non-consumptive appropriation” and the “invisibilization of translators as teachers.” These concepts elucidate the institutional erasure of human labor in AI development. The research not only underscores the critical role of translators in an era threatened by model collapse but also advances a redistributive approach to data governance grounded in this recognition.

📝 Abstract

This paper examines how the labour of translators has been transformed into foundational data capital for the age of artificial intelligence (AI). Translation memories (TM) and parallel corpora preserve a one-to-one correspondence between source and target text and therefore constitute extraordinarily valuable supervised training data for machine translation. The development of statistical machine translation (SMT), neural machine translation (NMT), the Transformer architecture, and multilingual large language models (LLMs) cannot be disentangled from the accumulation of such translation data. And yet, translators' renditions have been bought as deliverables under contract, segmented as technical objects, and processed as "information analysis" data under copyright law -- losing their moral, creative, and economic attribution to the translators who produced them. The paper develops two concepts to capture this process. The first is appropriation without consumption: a mode of use in which works are not read, viewed, or listened to, but only mined for statistical features -- a use that is legitimated under Article 30-4 of the Japanese Copyright Act. The second is the invisible teacherisation of translators: the process by which translators, through the construction of translation memories, post-editing, and quality assessment, have functioned as teachers of AI without recognition as such. Drawing on the data supply chain that runs from translators through language service providers (LSPs) and platforms to model developers, on a comparative reading of Japanese, European, and United States legal frameworks, on the distinction between open and proprietary AI models, and on the premium status that human-generated data has acquired in the era of model collapse, the paper asks what translators are actually afraid of, and points toward concrete directions for redistributive design.

Problem

Research questions and friction points this paper is trying to address.

translation memory

invisible teacherisation

linguistic data

AI training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

translation memory

invisible teacherisation

appropriation without consumption

human-generated data