Newsletter

Dreaming in Infinite Rooms

Finding relationships between data points is sometimes achieved through poetry.

Eryk Salvaggio

18 Jul 2021 — 11 min read

The cover of Fred D’Agnazio and Allen Wold’s “The Science of Artificial Intelligence,” 1984, via the Internet Archive.

“A man taken out of his room and, without preparation or transition, placed on the heights of a great mountain range, would feel something like an unequalled insecurity, an abandonment to the nameless that would almost annihilate him. He would feel he was falling, or think he was being catapulted out into space, or exploded into a thousand pieces. What a colossal lie his brain would have to invent in order to catch up with and explain the situation of his senses.” — Rainer Maria Rilke, Letters to a Young Poet

At an AI conference some years back, Facebook presented a 3D model of a suburban home it had rendered in 4K clarity. The team had gone into a home in the Palo Alto suburbs and captured it all with immersive cameras.

We could see their model of the house projected on a screen behind the speaker. In the model, the house went on forever, he explained, and the screen took us room to room. Meanwhile, another system was producing random arrangements of furniture and objects. In one room, a lamp to the left, in another, a lamp to the right.

They were trying to teach a robot how to wander. The house had become images, and those images became a virtual space. A robot, as a physical embodied entity, doesn't actually need to wander around to learn how to stroll. Because a robot is only reacting to digital information gathered by a camera. So, you train a navigation model how to respond to the model of the infinite living room in Palo Alto.

All learning the robot was doing was completely outside of its physical "body." Because there was no robot, there was no real movement either. And there was no real space. Only images.

Facebook’s recreation of a house in Palo Alto, captured for training robots to navigate spaces.

The researcher began to move through the house as fast as the system could handle. It generated new arrangements of furniture and viable paths around them. The model sped up until repetitions and variations of the house flashed in front of our eyes. By speeding up the model, they could zip through thousands of miles of one Palo Alto home’s living room and kitchen in a few hours. Removing the robot's body, and its slow embodied-robot speeds, meant they could practice object detection across the span of a couple thousand Earths.

It is moving through a dream world of infinite houses, the story it told itself through the data we gave it. This works because that is exactly what the robot is doing in real space, when that learning it put into its "body." It does not see anything but information, so it can train on nothing but information.

Simplifying the infinite living room allows for much faster training, because machines don’t need to understand what something is in order to understand how to respond.

But that information all comes from a suburban home in Palo Alto. (It has since diversified). You still can't ask the system what the images are, or why it can move through some spaces but not others. When deployed, the system takes the knowledge it has and tries to understand the world through that knowledge. But the “learned” world is just a metaphor, and the machine is navigating gaps between the metaphor and the world.

We imagine this process differently, and we tell another story about what it is doing, one that takes the task at face value. I think it’s crucial for those stories to be understood as part of every cyber-physical system: in a lot of ways, it is the most dangerous part of them.

We’ve been here before. Early AI research — we're talking the 1960s — focused on getting machines to pick up and move toy blocks around. In the 1980's, MIT researcher Richard Brooks described this "block world" as solving lots of AI problems. Blocks were solid, and space was not, and so you had this neat binary.

By Ksloniewski - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=46849623

For humans, blocks could represent anything. In theory, if your robot could move a block, that learning could scale. You could extrapolate it out to moving anything, from putting silverware away to taking shipping containers off of boats. It’s no different from the way a child uses blocks: the same block is a building, a car, a dinosaur, depending on what the imagination demands for the moment.

Soon, Brooks writes, the block model of the world for AI was dismissed as a “toy world.” Yes, it had created systems that moved blocks. But that turned out to be pretty irrelevant to moving silverware or shipping containers. Brooks illustrated the problem this way:

"A person sitting on a chair in a room is hungry and can see a banana hanging from the ceiling just out of reach. Such problems are never posed to AI systems by showing them a photo of the scene. A person (even a young child) can make the right interpretation of the photo and suggest a plan of action. For AI planning systems however, the experimenter is required to abstract away most of the details to form a simple description in terms of atomic concepts such as PERSON, CHAIR and BANANAS. ... Thus, because we still perform all the abstractions for our programs, most AI work is still done in the blocks world. Now the blocks have slightly different shapes and colors, but their underlying semantics have not changed greatly."

That was in 1987, and it seems to continue to be true today. For example, CNNs can't tell the difference between objects it sees, even when it can differentiate them. They don't, and likely won’t ever, "know" why the objects are different, even if they can tell us that the objects are different. Likewise, these systems are terrible at generating their own abstractions, such as analogies. 35 years after 1987, Melanie Mitchell explains:

"Focusing on logic and programming in the rules for behavior — that’s the way early AI worked. More recently people have focused on learning from lots and lots of examples, and then assuming that you’ll be able to do induction to things you haven’t seen before using just the statistics of what you’ve already learned. They hoped the abilities to generalize and abstract would kind of come out of the statistics, but it hasn’t worked as well as people had hoped. You can show a deep neural network millions of pictures of bridges, for example, and it can probably recognize a new picture of a bridge over a river or something. But it can never abstract the notion of “bridge” to, say, our concept of bridging the gender gap. These networks, it turns out, don’t learn how to abstract. There’s something missing. And people are only sort of grappling now with that."

AI pioneer Joseph Weizenbaum wrote about this using an analogy of his own. He asked us to imagine trying to get home by flipping a coin to determine whether to go left or right. Even in the off chance that the coin flip system gets you home, it doesn't mean the coin knew where you were or where you lived. Mitchell sums it up: "These systems don’t understand, in any humanlike sense, the data that they’re dealing with."

Shortly after a prolonged and painful breakup in San Francisco, I came across a handwritten sign taped to the glass door of a local market: “Please — closing the door, but gently.” The grocer had no intent of tapping something in me, but had done so anyway. This was found poetry, serendipitous and meaningful even in the absence of any intent. The coin was flipped, and it landed on language that stuck, language I didn’t know I needed to make sense of the emotional landscape I was slogging through.

In my artistic practice I've been working with a language generating system, the GPT-3, and its previous version, GPT-2. The GPT-2 was trained on a series of texts that leaned toward the political: Situationist writings, critical theory texts, cybernetics texts. The output is often a compelling arrangement of ideas. Reading them feels like reading Deleuze, or Lacan, or Foucault. (Arguably, that's because these people frequently wrote like random-language generating machines.) If you take the writing literally, it becomes quite hard to understand. If you project, interpret, analyze it, you start seeing connections and ideas. But where are these coming from, really?

I put together this video a while ago, which I'm sharing here for the first time.

The music is written by a music-generating algorithm. The text comes from the GPT-3. I added the imagery myself. I intended for the images to serve as a poetic layer of the system, closing gaps in meaning.

“Poetry” is, sometimes, what happens when you abstract out from one piece of information, take what remains of its meaning, and then re-apply that to a precise, other thing. This can be emotional, murky, unclear, imprecise. But we often rely on analogies and metaphor to communicate the internal, uncommunicable reality of the inner world.

The connections we make between signs suggest a relationship in the absence of any clearer explanations. We should not be afraid of this relationship, but maybe see generative art as we see cinema or painting: a world presented to us, for us to immerse ourselves in, that may touch us. Then we must withdraw from that imagined world, and reflect upon the experience of having been lost in it for a while, and think about the meaning of what we saw there: what the thing said to us.

All of this thinking about AI systems navigating the world comes back to this question of how AI art might help, or hinder, us from emotionally navigating an AI-mediated world. Much of the artistic output of these generative systems are, after all, a simulation of art: they are trained to predict the next image in a sequence of historical still lives, portraits, or landscape paintings. They are trained to predict a song melody based on the entire Western classical music canon. We seem impressed by this, but I haven’t met very many people who are moved by it.

An example of “AI art", produced by identifying patterns of pixels in a series of landscape paintings and asking the system to produce what comes next.

Like the robot navigating an infinite Palo Alto living room, the raw output of these generative art systems reflect only what they have encountered: culture as the same room, infinitely repeated, learning from the limits of variation within those rooms. There is no authorial intent here, though we may be tempted to look for one.

The notes, the furniture, the colors, changing but drawn from finite pallets. It’s the interaction with the real world — or the ways we imagine it — that produce a working system. That includes our imaginary relationships to the things that indifferent systems produce.

Sometimes there are shades of panic in the way I see AI art. It’s as if the machine is somehow getting deep into my psyche, colonizing culture as data and spitting something out. I think that reflects the weaponized ideology of broader data practices today: this is exactly what machine learning is doing, often to catastrophic results. And much of that comes from how we imagine the links between our imaginations and the machine’s “imagination.”

The machine’s imagination is reaching to find patterns and relationships, and as we have seen, it is doing so badly. Like us, it finds them even when such patterns and relationships do not exist, resulting in something like poetry from a different source: a found-poetry engine.

At the moment, we respond to this machine “imagination” in the same way that we find meaning within a human-produced painting, or poem, or film, or television advertisement. We imagine ourselves within those worlds. We do this within our private mental spaces, but we hand over some internal control to the artists, poets, or marketing agencies. When we do, our story and their stories become temporarily intertwined. Whether we are being manipulated by poets or design houses, we know it was human.

Even the most alienating and experimental of these communication forms are shaped by empathy, by that desire to speak and be heard. Machines, in simulating art, do so without any desire to connect or reassure us. The machine is not concerned with being understood, because it doesn’t, and cannot, understand. In the distance between us and it, we project all that we fear from the Other: infallible, all-knowing, all-aware, yet incompetent, exploitative, harmful — when we find meaning, we interpret it as anything but random, serendipitous, or accidental. We interpret through a lens of fear and the evidence supports it. I am used to the sense that the screen is always there to take something from me, package it up, and offer it back through the recommendation of some distant system.

The uncanniness — that close-but-not-quite-human quality of machine generated text and images — is a different way of intermingling imaginations because we imagine it to be different. I can taste the mediation on my tongue, like licking a battery. The image quality is not so clear, and so the limits of the machine imagination intertwines with a human desire to be immersed. We can see the colossal lie, much like Rilke describes. We can see how the imagination reaches, and sometimes fails, and it can be terrifying.

* Thanks to Alan Sondheim, Max Herman, Paul Hertz, Stephanie Strickland and Graziano Milano for informing this text by responding to ideas posted on the Net Behaviour mailing list.

Things I’m Doing This Week

*Text from Guy Debord, image by Windsor McCay (1904).*

More news on this later, but I’m excited to share that Şerife Wong and I have a project with the Excavations: Governance Archaeology for the Future of the Internet residency with the UC Boulder Media Enterprise Design Lab and King’s College London. We’ll be reimagining blockchain technology from a Situationist lens.

Things I’m Reading This Week

Aesthetic Flattening
Cade Diehm & Jaz Hee-jeong Choi

I’ve written enough this week, but this essay was really helpful in thinking about the “Zoom Year” many of us just lived through / are living through. It analyzes what made Zoom / video windows so difficult, and opens some paths for rethinking how they might be designed.

Aesthetic flattening [is] a highly reductionist substitute for human interaction. This lossy, compressed parsing of almost all creative or expressive output is the logical result of a computing world accelerated by the pandemic. Aesthetic flattening has been written about extensively, if often indirectly. It is a core source of angst in reflections on the hypermodern vulgarity inherent in the same software being used for everything from professional meetings to remote birthdays to funerals, or the absurdity of rushed, voyeuristic on-screen pedagogical endeavors, marred by limited support and buffer for failures.

-er.

Subscribe now