Whose Truth Gets Optimized? AI and the New Epistemic Order

Whose Truth Gets Optimized? AI and the New Epistemic Order

This is an edited version of my keynote talk delivered at the IV. International Communication, Digitalization and Society Symposium (ICDS), Istanbul Aydın University, April 28, 2026.


The theme of this year’s ICDS symposium — Truth in an Optimized Society: From Democracy to Infocracy — is a genuinely productive one. The concept of infocracy captures something real: a condition in which the management and optimization of information has begun to displace the deliberative logic that democracy depends on. I want to accept that framing and then immediately complicate it. Because the question the field has not yet sufficiently asked is: whose democracy, and whose infocracy?

The “optimized society” presupposes a universal subject being optimized. That subject does not exist. What exists are specific knowledge systems, specific languages, specific histories — some of which get encoded into AI as default reality, and some of which get left out entirely. The shift from democracy to infocracy is not a shift that affects everyone in the same way, from the same starting position. It is a shift that happens on top of — and through — existing structures of epistemic inequality. That is what I want to explore here.


Optimization Is Never Neutral

When we say that AI “optimizes” for truth, or for relevance, or for quality, we need to ask: optimizes according to what training data, what linguistic corpus, what cultural assumptions about what counts as a credible source?

I want to introduce the term epistemic colonialism here — not as a metaphor, but as a structural description. The global AI stack was built on a massively skewed corpus: English-dominant, Western-institutional, shaped by particular notions of expertise, evidence, and epistemic authority. When that stack becomes the infrastructure of infocracy, it doesn’t simply distribute information unevenly. It distributes reality unevenly.

This is importantly different from what we usually call algorithmic bias. Bias implies an accidental deviation from a neutral norm. But there is no neutral norm. Every AI system encodes an epistemology as its baseline — a set of assumptions about what kinds of knowledge count, whose voices carry weight, which events deserve documentation, which languages contain enough data to be taken seriously. The question is not whether a given system is biased relative to some neutral standard. The question is: whose epistemology is the baseline, and how did it get there?


The Algorithmic Canon

I find it useful to think about this in terms of what I am calling the algorithmic canon. Literary canons determine which texts count as authoritative cultural knowledge — which works are studied, cited, preserved, and treated as the standard against which other works are measured. AI training corpora function in an analogous way, but at a scale and with a reach that no previous canon has approached. They constitute a new kind of canon: largely invisible, presented as a technical artifact rather than a political one, and far more consequential than any literary or scholarly canon because it shapes the outputs of systems that billions of people use to understand the world.

Three cases help clarify what the algorithmic canon looks like in practice.

The first is Te Hiku Media, the Māori language organization in Aotearoa New Zealand that made the deliberate decision not to donate their language data to large AI systems. This was not a failure of engagement with technology — it was a sophisticated act of epistemic sovereignty. The community understood that contributing their language to an AI training corpus would mean ceding control over how that language, and the knowledge embedded in it, would be represented and retrieved. Canonization, in this sense, means loss of control over meaning.

The second is Google Translate, and the well-documented gender and cultural biases that have shaped its outputs. These are not bugs to be patched. They are the logical output of a corpus that reflects existing social hierarchies. When a translation system defaults to masculine pronouns for doctors and feminine pronouns for nurses, it is not malfunctioning — it is faithfully reproducing the epistemic assumptions of its training data. The “neutral” tool replicates the canon’s hierarchies.

The third case is closer to home. Turkey has several active initiatives to develop indigenous large language models — T3 AI and TÜBİTAK BİLGEM among them — framed officially as AI sovereignty projects. I think these initiatives deserve critical attention rather than simple celebration. The crucial question is whether they are building genuinely alternative epistemic canons, encoding different assumptions about knowledge, authority, and cultural memory, or whether they are replicating the structural logic of the dominant AI paradigm in a different language. Sovereignty over the infrastructure does not automatically translate into epistemic sovereignty.


AI Grooming: The Active Form

So far I have been describing epistemic colonialism and the algorithmic canon as structural conditions — features of how AI systems are built, trained, and deployed. But there is an active, deliberate dimension to this as well, and this is where my own ongoing research becomes most directly relevant.

I have been developing the concept of AI Grooming to describe the systematic manipulation of AI training data, feedback loops, and knowledge architectures by state or para-state actors in order to reshape collective epistemic perception — to make certain truths feel natural, certain actors appear legitimate, certain histories seem settled. The goal is not simply to deceive today’s reader. It is to shape what the machine will “know” tomorrow.

The distinction from conventional disinformation is important. Disinformation operates on outputs: false content is produced and distributed to deceive human readers. AI Grooming operates upstream: the target is not the current information environment but the training corpus that will shape future AI systems. It is, in the most literal sense, grooming — the patient cultivation of epistemic conditions that will bear fruit in the next generation of models.

The Pravda Network provides the clearest documented example. This is a coordinated network of hundreds of pseudo-local news websites, generating content at industrial scale, designed to appear as genuine local journalism in countries across Europe and beyond. The sites are written to be indexed, scraped, and absorbed into AI training pipelines. Their goal is not primarily to persuade today’s reader — it is to pollute the training environments of tomorrow’s AI systems with a particular version of political reality. The operation is scalable, deniable, and persistent across model versions. Once groomed data enters a training corpus, it is extremely difficult to identify and remove.

What makes this especially significant for the field is that it operates entirely upstream of the interventions we have developed to address disinformation. Fact-checking, content moderation, platform governance, media literacy — none of these tools reach the level at which AI Grooming operates. By the time a piece of groomed content has been flagged as false, it may already have been incorporated into the training data of the next major model.


Toward a New Vocabulary

This brings me to what I think is the most urgent intellectual task for communication and media studies right now: developing a vocabulary that operates at the level of infrastructure rather than content.

The concepts we have — disinformation, fake news, post-truth, filter bubble, media bias — all address what happens to information in circulation. They are content-level concepts, suited to a media environment in which the primary problem was false or misleading content reaching human audiences. That environment has not disappeared, but it has been joined by a deeper problem: the systematic shaping of the epistemic infrastructure that AI systems will use to generate, evaluate, and retrieve knowledge.

I want to propose a loose conceptual map. Epistemic colonialism names the structural encoding of particular knowledge hierarchies into AI as default reality — the condition from which we begin. The algorithmic canon names the invisible corpus of what AI “knows,” and draws attention to the political character of decisions about what gets included and excluded. AI Grooming names the deliberate manipulation of that canon by political actors — the active weaponization of the structural condition. And epistemic authoritarianism names the endpoint of grooming: a condition in which truth is locked in at the infrastructure level, before the public even enters the picture.

This is more than an extension of the democracy-to-infocracy axis the symposium proposes. It is a different kind of threat — one that does not merely flood the information environment with noise, but attempts to quietly determine what future AI systems will treat as knowledge in the first place.


The Question That Remains

I want to close with the question I think this analysis makes unavoidable.

If the algorithmic canon is already being written — and it is — then the critical question is not only who trains the models, in the technical sense. It is which communities, which languages, which knowledge traditions have the standing to say: this is what reality looks like from here, and it belongs in the machine. This is a question about epistemic rights, about data sovereignty, about whether the infrastructure of AI will reproduce the colonial hierarchies of previous knowledge systems or genuinely expand the range of what counts as authoritative knowledge.

The move from democracy to infocracy is real. But the more important move is the one happening beneath it — the quiet consolidation of whose truth the infrastructure will optimize. That is the move communication scholars need to be watching, and intervening in, right now.


This post is part of my ongoing research on AI Grooming, epistemic colonialism, and the politics of AI knowledge infrastructures. Related work can be found in earlier posts on this blog.


Discover more from Erkan's Field Diary

Subscribe to get the latest posts sent to your email.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.