AI Translation and Linguistic Colonialism: The New Role of the Translator

When Algorithms Translate: Linguistic Colonialism, Data Sovereignty, and the New Role of the Translator

This post is based on a talk I delivered at the panel “Translation and the Role of the Translator in Digital Communication,” organized by the Translation Association of Turkey (Çeviri Derneği) at Istanbul Aydın University on March 13, 2026.


In Turkish, the pronoun “o” is gender-neutral. “O bir doktor” — he or she is a doctor. “O bir hemşire” — he or she is a nurse. For years, Google Translate resolved this ambiguity the same way, every time: the doctor became male, the nurse became female. Automatically, silently, across millions of translations.

This was not a typo. Social stereotypes were being reproduced with mathematical precision. The same pattern held when translating into Spanish, Russian, German. Google noticed the problem and issued a partial fix in 2018 — but only for a handful of languages, and only incompletely.

The question this raises is not a technical one. It is political: whose values are encoded in the algorithm? And if the algorithm is translating the world, whose world is it translating?

My argument is simple: in the digital age, translation is no longer an invisible bridge between languages. It is an architectural intervention — a structural shaping of meaning, power, and knowledge.


The Algorithm and Its Blind Spots

The gender bias in Google Translate is a symptom of a deeper condition. Large language models — the engines powering today’s translation tools — are built on profoundly unequal data foundations. Research shows that approximately 93% of GPT-3’s training tokens are in English. For LLaMA 2, the figure is around 90%. The remaining languages of the world share the scraps.

This is not a neutral technical fact. It is an epistemic hierarchy baked into infrastructure. And it has consequences beyond mere accuracy.

Models trained on English-dominant corpora tend to use English as an internal pivot language — meaning they do not “think” in Turkish or Arabic or Swahili. They think in English and translate outward. Researchers have described this as LLMs having a “foreign accent” in non-English languages. The output is grammatically acceptable but culturally estranged, subtly off-register, hollowed of local idiom.

Meanwhile, more than 75% of major LLM benchmarks are designed for English-language tasks. Performance in other languages is measured as an afterthought. The effect is a systemic flattening: models optimized for “average” and “safe” language production gradually sand down local expressions, ironies, and cultural resonances.

This is not a translation error. This is the smoothing-out of language itself.

I would call this linguistic colonialism — but with an important caveat. It is not a conspiracy. Nobody is deliberately erasing Turkish or Māori or Swahili. The mechanism is more insidious: data hierarchies, market logic, and engineering decisions combine to produce a structural outcome. As I often say about digital colonialism more broadly: colonialism operates not through intention but through infrastructure.


Resistance: Two Cases

Can anything be done? I want to offer two examples — one from the other side of the world, one closer to home.

In the far north of New Zealand, in a small, economically depressed town, eight GPUs hum inside a run-down building. This is where Te Hiku Media — a Māori broadcasting organization — has been building its own language models from the community’s 30-year audio archive. The resulting system transcribes te reo Māori with 92% accuracy. It was built by and for the Māori people.

But the technology is not the real story. What makes Te Hiku’s work remarkable is its governance framework. All partnerships are protected by what they call the Kaitiakitanga license — a data sovereignty instrument ensuring that the community retains control over its data, and that the data cannot be used in applications that surveil, discriminate against, or otherwise harm Māori people. When a US-based translation company approached them to collect voice data for commercial purposes, Te Hiku refused. As one of its founders put it: “Data is the new land. Having had our land taken from us, we take data sovereignty very seriously.”

This is not a technology project. It is a survival struggle on a new historical terrain.

In Turkey, similar anxieties have produced similar responses. T3 AI — developed by the T3 Foundation in collaboration with Baykar — has been launched as an open-source Turkish language model, operating under the slogan “ethical AI.” TÜBİTAK BİLGEM is pursuing a parallel Turkish LLM initiative. These projects express a genuine desire to build language infrastructure that “thinks in Turkish” — that preserves local proverbs, idioms, and cultural reference.

But here I want to press on a critical question: does “sovereign AI” automatically mean liberatory AI?

The Māori case offers one model: community-led, transparent, defensively oriented against corporate extraction. The Turkish case is more complex. State institutions, the defense industry, and major media organizations are among the project’s stakeholders. Whether this produces linguistic sovereignty or a different form of power concentration is a question worth sitting with.

Sovereign AI always carries the question of whose sovereignty. This is uncomfortable — but it needs to be asked.


The Translator’s New Role

Where does the translator stand in all of this?

For decades, the dominant ideal in translation studies was the translator’s invisibility. A good translation, the canon held, erases its own traces. The translator disappears into the text. But in the digital ecosystem, this invisibility is no longer a virtue — it is a vulnerability.

The translator no longer stands between two languages. She stands between the human and the machine.

I want to suggest three new roles that are already emerging in practice.

The first is what we might call the cyborg intermediary. In financial, legal, and technical domains, translators are no longer working purely with text. They work with data flows, interfaces, and systems. This is not just a technical evolution — it is an identity question. “Am I still a translator?” is a real and unresolved professional anxiety.

The second is the transeditor. In news translation, pure linguistic transfer has long been obsolete. The translator reconstructs the story for a local context — adding background, adjusting emphasis, sometimes omitting. The person deciding what gets translated, what gets cut, and how context is framed is now an editorial actor. This comes with power, and with responsibility.

The third — and to my mind the most politically significant — is the cultural auditor. Machine translation post-editing is rapidly becoming an industry standard. But “post-editing” understates what is actually happening. The translator reviewing machine output is not correcting typos. She is detecting algorithmic bias, recovering cultural register, defending local specificity against the homogenizing pressure of the model.

The translator who corrects the machine is auditing the algorithm. This is not a professional evolution. It is a political act.


Conclusion

Back to where we started: Google Translate made the doctor male and the nurse female. Google noticed, and partially fixed it. But the deeper question remains: who noticed? Who decided what needed fixing? Who gets to determine which biases are worth correcting?

Everyone in the translation community — practitioners, educators, researchers — is inside this architectural intervention. We are each, depending on context, designers, materials, or auditors.

The Māori community said it plainly: data is the new land. They took that seriously because they know what losing land means.

We know things too. Turkish is a minority language in the digital world. Every day, millions of Turkish texts pass through models built elsewhere, trained on other languages, calibrated to other cultural norms. How well do those models know Turkish? Whose Turkish do they represent?

The translator’s new role is to become visible — not despite the algorithm, but inside it. To audit, to resist homogenization, to defend the particular against the average.

Because whoever translates the language builds the meaning.

 


Discover more from Erkan's Field Diary

Subscribe to get the latest posts sent to your email.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.