What Machines Don't Know

What Machines Don't Know
Photo by Jay / Unsplash

Imagining Language Without Imagination


It's important to acknowledge that Large Language Models are complex. There's an oversimplified binary in online chatter between the dismissive characterization of LLMs as "next-word predictors" by many anti-AI proponents, and the pro-AI advocates who act as if the model is a perfect replica of the human brain. In many ways, "next-token predictors" is an oversimplification: it would be more accurate to say that LLMs are incredibly complicated next-token predictors.

For those blessed enough not to understand what any of that means, a quick explanation is in order. A large language model operates through tokenizing language: converting words into numerical values, and then embedding various pieces of numerical data about those values into a series of lists.

cat => 75

Every word in the training data has such a list, and the numbers in the list represent some relationship to other words. That is a lot of information, all expressed as coordinates in a large graph. The lists are just numbers, describing the relative positions of each word in a huge multi-dimensional space.

cat => 75[1.2, 2.4, 0.0, 0.0, 4.5 ...]

Predicting the next token is how the model "selects" a word. Your prompt operates, essentially, by tuning dials until a series of words line up in a mathematically constrained sequence. It is not a single prediction, but a back-and-forth jostling of these positions until they fit a nice statistical contour guided by the values associated with the words in your prompt. They are jostled into position.

Much of the most intense debate in AI defaults to "you don't even understand the technology." But I suspect the most significant distinction is not whether we understand how a model works, but in how we interpret what the structure is doing.

Thought Police

Many people attribute various legal and social values to the functionality of LLMs based on their ability to "learn" textual relationships from training data and produce compelling text from what they ingest. Bolder claims assert the value of AI-produced text by degrading the thought process that motivates human speech. This claim, usually tossed around online, is that humans are also just next-token predictors, that the human brain is a pattern-finding machine and speech reflects this.

To believe this to be true, one would have to imagine that all human speech is motivated entirely by grammar. I have no idea if this belief is wrong, I'm just saying: you'd have to believe it. Now, I want to be careful here: the distinction between human grammar – for example, in the structure of sentences in the English language – is distinct from what we would call a "grammar" in an LLM. What I'm about to describe has a technical term in LLMs, "embeddings," but we can think of it as a ball maze.

A wooden ball maze, with knobs on the side, which tilt the surface of the maze so that a ball can move through narrow channels into holes.

The word is placed into a model, which is structured by the entire corpus of training text. When we prompt the model, the space around each word - the vector space - shifts, and activations flow through it, triggering paths through these "embedded tokens" (words) based on the relationship to previous tokens. As opposed to a ball maze, where the goal is to avoid the holes, we can think of this constantly shifting vector space of the LLM as an attempt to fit each word through a specific hole, or, at least, one close enough to a specific hole.

For every word in the LLM's output, the "ball" in this metaphorical maze is passing through thousands of mazes at once, based on how many parameters we assign in the model. We can imagine the ball moving through three-dimensional space, surrounded by a series of interlocked yet narrowly confined paths, with the system determined to find the path that every steel ball can find through every proper hole. Once the "word" (the token representing the word) is slotted in, all of the tokens around it are rejiggered (called "back propagation") until the sentence, or paragraph, "works."

Therefore, the grammar of an LLM is structured in constantly shifting ways. The words in the user's prompts become tokens that triggers a negotiation with the surrounding tokens, which influences the chance that any particular word will emerge in response to those surrounding it. Every word has a long list of values that can influence and be influenced by the long list of other values linked within different words.

In human speech, every word pushes and pulls the others in new directions of meaning. To simulate this with a machine, we can stretch and reassign each "value" in the matrix of associated words. We can understand this process as math rather than language, and see how such math could create a compelling simulation of language.

Every word in a generated paragraph as a solution to this problem of mathematical sequencing. Which is a very different goal from human speech.

Nonetheless, this mathematical process enables settling a word into its surroundings rather than finding words to fit a meaning. This explains how models arrive at the contextual production of speech without the contextual understanding of the world: in the same sense that a ball can be dropped into a hole in a puzzle maze. It navigates not through conscious reflection of where it ought to be, but as a result of following a structure that shifts around it. Language is "slotted in," rather than "produced." And it is humans who do all the work.

When Words Are Also Grammar

Any logic of an LLM is therefore linked to and narrowly defined by any given word's position in a series of matrixes. It is literally formulaic. Plenty of human language is formulaic too. But the LLM uses its own machine "grammar" in a different way from human grammar, and this difference is crucial.

Human language is motivated by the articulation of thought; machine language is crafted through structure. Machine structure is a grammar that entirely dictates the production of language, and words are themselves considered part of the grammar, not individual referents to a broader concept.

As a result, the likelihood of finding new arrangements of words through an LLM is determined not by the capacity of AI reason, but to shuffle the expectations of a word's proper position, ie, to loosen the range of slots a word could fill. The model does this by introducing noise, which can be controlled through a parameter known as "temperature" in most language models.

But as it is with AI-generated images, any new collisions of meaning and arrangements of text are similarly cosmetic. It is serendipity. The difference noise introduces is left to the reader to determine: "Is this text a new idea, or is it noise?" But any new idea is ultimately a result of noise introduced to a rigid system, and consequent recalibration into legible text. LLMs produce neither reason nor a thoughtful consideration of facts. They can't anticipate the consequences of a word to the situation in which it will be added. Rather, it produces an output that creates a plausible approximation of where words might land if reason or thought were present based on where those words have generally landed where reason and thought had previously been present (human writing).

Humans can, and do, write this way sometimes. Consider cliches and aphorisms, thoughtless texts and emails, and the semantic satiation of overuse: "I love you," "I miss you." How do these phrases compare, or articulate, the experience of longing for someone you love? They do not, and so they serve as markers of a sentiment that fails to fulfill what they mean to do. They fail notably in contrast to a poem written about missing somebody, which strives to find new arrangements of words to articulate an experience shared by millions but in a uniquely meaningful way. In most cases, it is the effort of finding these words, not the choice of the words themselves, that move us to embrace them.

Each word is ultimately a rule about where it can be placed, rather than a gesture to an experience...

Human language decisions follow rules of grammar too, but we have flexibility within these structures. We therefore give rise to a thought and articulate that thought based on which word, in which grammatical slot, best serves our internal concepts to convey them to others. This is an important distinction from a machine determining the likelihood of a word's position amidst multiple shifting axes, even if they are cosmetically similar. The difference, I propose, is that the words of an LLM and the grammar of an LLM are inseparable: each word is ultimately a very complicated set of rules about where it can be placed, rather than a gesture to an experience it might articulate.

To be clear, this doesn't diminish what LLMs can do, which can be impressive, though I am ultimately more impressed with their architectures – transformers and the like – than I am with language production, given how badly understood the language it produces has come to be. LLMs are doing something, but it isn't doing what humans do with words.

A Machine Cannot Imagine Itself

Perhaps it was Sartre who suggested that consciousness is the ability to imagine itself. That is missing from the LLM, though some might argue otherwise: an LLM cannot imagine itself, though it can describe itself by slotting words into sequences. It can slot layers and layers of words into thousands of mazes simultaneously until it can create text about the text it has made, and then summarize that text and call it reason.

Shuffling prior text to prime future text is a clumsy concept of consciousness: the roots of LLM text are always driven by the position of a word in approximation to other words, rather than arising as a gesture of feeling, or connecting to true internal imaginations of one's own mind. This is not what the architecture of an LLM entails, regardless of the number of parameters involved.  

But many people can fall prey to a strange paradox here: unable to recognize that there is no "imagination" in the assertions of an LLM about itself, we also fail to acknowledge that the text produced by the model is nonetheless imaginary, a hypothetical conjecture of symbols in proper slots whose connection to an imagined "self" is absent. The imagination is in the language, not the model, and it is socially activated.

Current architectures of LLMs cannot imagine, but they can sequence. They can operate within our imaginative symbolic frameworks, but they cannot use symbols because they cannot imagine themselves participating in the negotiation of those symbols. For the same reason that a dog can go to church but a dog cannot be Catholic, an LLM can have a conversation but cannot participate in the conversation.

A dog can "go to church" but a dog cannot be Catholic. An LLM can have a conversation but cannot participate in the conversation.

Some will claim, nonetheless, that this is still like human thought. The concern for me, as a humanist, is less about proving things one way or another if this is true, which as far as I can tell is a philosopher's coin toss. In the meantime, I think there is use in determining whether or not we want to categorize these types of language making into the same category.

The decision to equate human thought with complex machine slotting has significant social implications. It presupposes that human expression is only and without exception the automation of grammar, that words always and without exception determine, for themselves, when they will appear. The mind becomes a vast mathematical vector space through which words assert themselves rather than a personal library through which words are, sometimes, found.

None of this will convince the convinced, and as I said: it's all a matter of interpretation. For those who make the case, they can argue that the lookup table is like looking at a thesaurus, missing the point that it is like being forced to use a thesaurus and follow the shift in meaning by rolling dice. There is a key distinction there, and I accept that I haven't quite articulated it yet. This is a newsletter, not a thesis.

But what is clear is that no neural network arrives or imagines itself; it is wholly shaped by the data given to it. Even if an LLM was someday designed to find meaning in its words, it would arrive at conclusions steered by those who design the weights inside the system, on data selected for that system. If we could prove once and for all that a "world model" was any approximation of our own, this makes the matter of using the LLM to present your own ideas all the more worrisome.

The personal ceases to matter then, and so too does any real sense of "meaning" to the care of crafting a thoughtful phrase. We have always been forced to the constraints of language to express ourselves, though we can pair it with all kinds of things. Writing, in the worldview of human-machine equivalency, is always automatic: no staccato in the exchange of thought and articulation, just the steady drumbeat of statistically constrained lookup tables.


The Mozilla Festival!

November 7, Barcelona

The Mozilla festival is happening in Barcelona starting November 7 and it has some amazing folks on the lineup focusing on building better technology. (Yes, this is a sponsored endorsement, but it's a genuine one!). You can hear from folks like Ruha Benjamin, Abeba Birhane, Alex Hanna, Ben Collins (from The Onion), the Domestic Data Streamers collective, and others you'll be familiar with if you've been reading here for a while.

It's going to be a great break from the constant drum of bad tech news, plus cool art and installations, like this database of online AGI hype.

Here's more info and your chance to buy a ticket.


Toronto, October 23 & 24: Who's Afraid of AI?

I'll be speaking at the "Who's Afraid of AI?" symposium at the University of Toronto at the end of October. It's described as "a week-long inquiry into the implications and future directions of AI for our creative and collective imaginings" and I'll be speaking on a panel called "Recognizing ‘Noise’" alongside Marco Donnarumma and Jutta Treviranus.

Other speakers include Geoffrey Hinton, Fei Fei Li, N. Katherine Hayles, Leif Weatherby, Antonio Somaini, Hito Steyerl, Vladan Joler, Beth Coleman and Matteo Pasquinelli, to name just a few. Details linked below.


Oslo, October 15: Human Movie (Performance & Discussion)

I'll be in Oslo to perform "Human Movie" followed by a panel discussion through the University of Oslo. More details forthcoming, more information at the link.