Address Not Found (Part 1)
Can we understand AI systems as communication networks?
By now a few things should be obvious. The way we organize information is a powerful means of shaping the world. The decisions that motivate us to collect certain facts (and neglect others) reflects a set of priorities: a choice of what is relevant and irrelevant, what belongs inside or outside of a category.
The way this organization happens can itself be categorized — we might call it “discourse,” that is, the public contestation of categories and the kinds of rights and restrictions those categories afford. We navigate knowledge through dialogue, and when dialogue extinguishes itself, we might rely on civil disobedience, or even violence. Social movements are galvanized to challenge the categories assigned to people by those who make laws or enforce them.
There are more mundane categories that can shape our imaginations and understanding of the world as well. For example, we categorize codes of discourse. Some forms of discourse are acceptable, while others are deemed unacceptable. One such categorization of discourse is “rational” versus “emotional,” wherein those who are unaffected by a crisis can establish themselves as more relevant than those who are upset by it. As a result, the discourse itself becomes contested. In other words, before you can change society’s categorizations, you also need to change the acceptable categories of conversation around changing those categories.
Libraries and academia are not exceptions to this. The categorization of knowledge into academic subjects and departments, and their notorious isolation, may amplify certain specificities of research — there are fields where going very deep into a single component are incredibly relevant. But the cross-pollination of ideas between specializations has been notoriously difficult, despite that inter-disciplinary exchange is one of the strongest indicators of innovation in science.
You can have advocates for interdisciplinary collaboration push for dissolving these boundaries within individual projects, which takes us back to the idea of categories being contested through discourse. This is a purely social activity, but increasingly, our communication technologies are reducing, not amplifying, that social activity. Consider that communication needs to become coded to enter into what we call electronic mediated discourse.
Social media is governed by algorithms that will categorize your sentiment. Twitter, for example, suggests your comments to those who have liked or responded to similar comments in the past. Through the very nature of categorizing your discourse, your comments are filtered to those who are most interested in them. Your sentiment becomes encoded and then, from that categorization, the algorithm decides who would engage with your comments — Bernard Dionysius Geoghegan, in Code: From Information Theory to French Theory, writes that Norbert Wiener had described exactly this risk in the 1950s,
“that larger communities embodied diminished networks of information. In multiplying the size and relays of society, technical communication transformed organic social community into a relation of senders and receivers whose reality was hidden behind the complex system of codes and relays that held them together. … in Wiener’s description, society takes the form of an informational system that becomes more disordered as local face to face relations give way to an overload of relays.” (126).
If this is a problem within large communication networks then it is also a problem with large datasets being distributed through large neural networks, which is what we talk about when we talk about Large Language Models like GPT4 or, for that matter, image generating tools like DALL-E 2 or Stable Diffusion.
AI is tasked with moving information from one place to another, adapted from one form to another, through a neural network. On one end, information is encoded (through collecting, aggregating and organizing data) and on the other, information is decoded (through the production of images and text influenced by that data).
Senders and Receivers
A hearty change in the way that communication takes place when it is digitally mediated is that the receiver may not always be the one for whom the message was intended. In essence, algorithms like Twitter’s “For You” feed have a problem — it is the worst post office in the world. Every message written is going to the wrong address (or hundreds of wrong addresses). The very question of who the intended receiver is, is hopelessly convoluted when we talk about ourselves as broadcasters. The context of a message can be stripped of our original meaning as a result: sarcastic or ironic posts are delivered to people who don’t know who we really are, and are therefore unable to decipher irony in the messages they see us sending.
We might understand, then, the relationship between a text we write, or an image we create, and the path it takes through the electronic communication relays of artificial intelligence systems. We can understand Large Language Models, or image synthesis tools, as a communication relay between sender and receiver. The person who encodes a meaning into an image and the system that decodes that meaning, categorizes what it can of that meaning, and then re-encodes and delivers a new message in what is the world’s worst game of “telephone.”
(Notably, this is deliberate, as in the case of Diffusion models literally dissolving images into noise — the absence of communication signal — and then reassembling it. Or the reduction of text into tokens, sorted into hierarchies of likeliest pairings).
Categorizations within image models are quite clear. We can see an example in this collection of 16 portraits created from the prompt “British Person”:
You would notice that “British People” in this model’s categorization are nearly exclusively old white men wearing hats. There is a categorization at play here, one that is structured by a kind of dominant social imagining of what “British People” look like, one that would be very much at odds with residents of Britain who come from its former colonies or, for that matter, are women.
So the category of “British Person” has been inscribed into the model, and the result — and this is merely the first 16, and comes from a month ago as I was preparing these images for a lecture for folks at Cambridge University. By now, it may have changed. Who can tell? That’s part of the problem: when we can’t see the path of the messages we send or recieve, we have no way of understanding how to clarify that message, or contest the message.
We can consider these images as a misdirected letter. A series of photographs were posted online, across an endless series of websites, wherein pictures of older white men with hats seemed to be largely associated with the idea of “British Person.” Likely, it was some melange of elements: not just white or elderly or male or hatted, but that each of these elements was present across a vast collection of images.
Any one of those people creating these photographs or images is the sender of this message, in a sense. But if social media networks are known for sending your message to the wrong senders, then neural networks are disregarding the message altogether. These are the scrambled reassemblage of noisy signals (literally). The diversity of images is lost as the system broke individual images down and created a kind of collective space from which to sample in connection to this prompt.
Likewise, Large Language Models are inherently categorizing systems, where information is not even assigned to categories such as “true” or “false” but “most likely.” Again, the same principle applies: the language model has learned from text created by people for specific audiences, but it has collapsed any intentional audience. Language has been stripped of its context and reassigned according to statistical word flow. For facts, for perspectives, this is a far cry from a melting pot. The statistical correlations are shaped by weight: the more words are associated, the stronger they become.
Words become individual sticks bundled together, finding strengths through similarly structured categories and associations. We may understand that words and images are biased signals. Less often do we talk about how these signals are moved through biased structures, or structures designed to amplify certain aspects of those signals.
Automated Pattern Recognition
as a Communication Network
Right now we see AI generated images and text as media. If we want to understand these products as media, we should understand them as communication. If we understand them as communication, then we might benefit from analysing so-called “AI systems” as communication networks rather than as “authors,” “artists,” or even more nuanced ideas such as “automated pattern recognition” (which it is!). On one end of the system there is human discourse and the products — images and text — that make that discourse possible. On the other end are consumers of the mediated version of that discourse: the neural network, translating and re-encoding whatever it “learns” as the unintended receiver of that discourse.
The system through which it categorizes and labels this discourse is an exercise in power, mediated through technology companies. Whereas we have always considered communication technologies to be focused on the communication signals of the sender making it to the intended receiver, these systems are, by and large, completely unconcerned by that. There is no specific sender, so there is no specific context or intention. There is only a collective sampling of noise filtered into the semblance of message.
For AI to be what it promises to be — a search engine, an educational tool, an encyclopedia, a politically neutral observer and aide of human progress — it would have to be understood as a communication system. As a communication system, it is terribly designed: it is deliberately noisy, impersonal, decontextualized, and closed. Closed, as in, there is no room for new or under-represented ideas to challenge the weighted biases of the language model. It is a set of hegemonic ideas disguised as a social “consensus.”
While these tools exist alongside discourse — the discourse doesn’t end — it is an enclosure around a certain set of ideas that these companies promise to infuse into nearly every aspect of our lives, from psychology to education to eating disorder hotlines. If we are really inspired to do so, it is worth asking, then, who is the author of the automated text. It is not the system itself, because the system is simply a communication network, moving a collective archive of text into new arrangements in the hope that it fits a new purpose.
I think the framing of AI systems — that they are used to communicate, without any clear origin for what is being communicated — creates a gap, which we fill by projecting the illusion of intelligence and expression. This masks something fundamental to these systems: there is nothing being communicated. Rather, information is being assembled from a melange of messages.
In part two, next Sunday, I want to propose a shift in how we frame communication types: from the one-to-one (telephone, telegraph, text messages) to the one-to-many (newspapers, radio, television) to the many-to-many (social media and the internet) to the many-to-one (generative synthesis and large language models).
See you then!
Thanks for reading! I am traveling this summer and posts may be sporadic, but I hope you’ll consider subscribing — and sharing this post if you found it interesting!