Newsletter

What does it mean to break a system?

A Studio Visit as Exploration of Technical, Social and Aesthetic Breakage

Eryk Salvaggio

14 Jul 2024 — 11 min read

This post is meant to be a bit of a “studio visit.” It’s showing some works in progress and what I am thinking about in my work. I was recently teaching a summer class where I was invited to speak as an artist, and about 90 minutes in I started the Q&A and was asked if I could show any of my own work. That kind of stunned me, because I was invited as an artist and spent 90 minutes unpacking AI without showing any of my own work! It was understandable in a way, as I was meant to talk about AI too, and that can be quite time consuming.

But I wanted to open up the studio — which is essentially a desktop computer on top of a very narrow baby blue IKEA desk — to show some of my works in progress and where I am in my thinking at the moment.

As an artist and a researcher, I often say that I aim to break generative AI systems. I like to find the limits of what the systems do and discover how they respond when those limits are pushed. This is useful as an orientation, because tech companies are constantly setting those limits in opaque ways. Inevitably, research into black box systems requires us to observe how the system responds to a query or action, because we can’t identify the decisions that would be written into code, or office memos.

But there are very good reasons to be skeptical of this position. One of them comes down to definitions. Paul Pangaro asked me, in response to a recent post, what I meant by breaking the system. Is there a kind of implicit trust in the system’s honesty about it’s glitches in my experiments?

Here’s an example. One of my go-to tests for any new diffusion model is to prompt it to produce images of various types of noise. I have written about this before, but the model is designed to add and subtract noise from an image (or from sound files, or blur from a video). In the process of arbitrarily removing noise, vague clusters of image patterns, sounds, or motion appear, which the model then uses to structure further noise reduction.

As an artist, asking these systems to render noise is intriguing because it’s a paradoxical request. It is removing noise in the direction of producing something that looks like noise. It’s this posture that is interesting to me, because we’re circumventing all of the messy training data and its related issues in order to produce something unique to the system. Something that looks like the following:

Sometimes, noise introduces interesting results that tell you about that system. When I asked MidJourney 6 for an image of noise, it responded in an unusual way. Rather than giving me swirling, pixelated abstractions as it often does, it gave me an image of R2-D2.

This suggested something to me about how MidJourney is calibrated, and how even in noise, it can find training data. It lead me to think more carefully about how images are produced by these systems. Midjourney often finds faces and other objects in noise prompts, almost as if imposed by a secondary system designed to deliver stronger outputs. That these systems may defer to the intellectual property of Disney is also telling. A common defense is that you get these images by asking for them, and that this is evidence of the user misusing the system to generate IP violations. But this was proof that IP gets into every nook and cranny of the prompt window.

Notably, the most interesting noise abstractions are a result of lower pixel counts, because there is a tension between what the model can write into a jpg and the granularity of digital noise. In other words, it can’t write every pixel randomly because it doesn’t have the resources to do that, so it has to find clusters of pixels based on whatever scale it actually works at. As models get more detailed, they will eventually become better at writing noise.

But OK, I’ve explained why I do it and how it works. But there’s a valuable warning embedded into Paul Pangaro’s question. Which is, how do we know when the machine is actually making noise (as in, generating noise from patterns in the training data linked to noise), as opposed to breaking down, with noisy images being the result of that breakdown? How “pure” is the break?

Verifying Breakage

Three things have confirmed this breakdown to me.

Data Auditing. The images for noise in LAION 5B were not remotely identical to the images for noise produced by Stable Diffusion. This could change, and I can’t see what new images are included in new training batches. But it was a clue.
Transportability. Noise prompts generate wildly varied results, and they do it across diffusion systems. Audio, video, and images break down in unique ways with the release of nearly every new model.
ChatGPT said so. OK, this may be the least validating option. But! GPT4 is a chat-driven image generation system. In various attempts to get the system to produce an image of noise, ChatGPT produced images that were not noise — pictures of bucolic landscapes, etc — and presented them as if they were. This is evidence of a system breakdown. What’s more, when I told the model that it had presented an image of noise that was not, in fact, noise, the model re-attempted the prompt and produced additional images that were not noise. Eventually it responded that it could not create images of noise based on the architecture of its image generation system.

When is a system not broken?

Images are a relatively straightforward system to break in this way, but sound is a different story. The key complication in “breaking” sound generation models is that noise refers to a technical process, but also a genre that would be tagged in training data. In the music generating diffusion model, Suno, noise prompts produce a variety of sonic effects: tape hiss, distortion, and drones. These are nice, but likely also a result of the model operating as intended.

But push the model far enough and you get a kind of model collapse. The sound starts to skip like a broken CD and it doesn’t recover or return to the musical patterns that occured before it. (This glitching occurs at the end of “500,000 JPGs” and is why I kept it in).

Another not-broken aspect of the system is in video synthesis. Video synthesis has a lot of errors, bodily distortions, impossible physics, and so on. When video models produce an image of a woman turning into three different women, I don’t consider that a deliberate break on my end. I see that as an inherent limitation of the system. You can find ways of emphasizing these distortions, which can be useful for understanding the model.

For example, the bird in the video below isn’t “glitched” or “broken.” It’s just pushing the limits of how Gen3 can render the motion of a flying bird. The background, however, is a product of prompting for a specific kind of white noise that the model can’t produce. It’s too early for me to say for sure with video, but in images, rendering noise prompts often times exaggerates the limitations of the model: already bad at rendering hands, they become worse, etc.

Sometimes you don’t have to engage in any kind of adversarial behavior for it to come out of the system. In some sense, the system is just already inherently broken! For example, it makes birds fly weird, or dancers’ heads turn into limbs. But this “breakage” is not something that results from my input. It is in fact not “breakage” at all, but just what the system does, an artifact of how it orders data and the logic applied toward the automated animation of that data.

I’m typically less interested in work that just sort of says “oh look at how the AI can’t make Will Smith eat spaghetti” because then, like, five months later a new model comes out and Will Smith is eating spaghetti just fine. Pointing to the technical failures of the models isn’t as interesting to me as finding ways to induce technical failures.

But sometimes we can break it even further, and introduce aspects of breakage that help push the internal limits to the forefront. With the gaussian noise feedback loops, we sometimes see elements of the training material “survive,” such as birds, but in the confusion, it animates them especially poorly.

But I think this is harder than it used to be. I have my doubts about whether noise prompts work in generated video, as above, in the same way they have with images. But I am experimenting to find out. I’m also sharing that work as I go.

“Posture” has a negative connotation, like posing. But a posture is also a fighting position, or a position a dancer takes in their art. To me, the posture of adversary is a mindset, and it’s a mindset that informs a set of gestures and interrogations of models. Some artists think of AI as a collaborator, but I am unwilling to give it that degree of agency. It’s ok to be against an algorithm, it is ok to question it in unproductive ways that don’t necessarily aim to “improve” it.

Sometimes breaking a system means overcoming the constraints set by the companies that make the model available to us. For example, using AI to critique corporate AI logic is a form of breaking that is more social break than technical break. I see the platforms associated with AI as an extension of the organizing logic and the epistemological reorientation produced by AI.

Case in point: so many AI film festivals have members of the boards of AI companies on their judges panels. It means that basically, critical artwork gets shut out of the system, or being shown at AI-friendly film festivals. What rises to the top, with rare exceptions, are product demos. Consider that a product demo for OpenAI’s Sora won a Golden Nica at the Ars Electronica this year — an infinite dolly shot made because Paul Trillo was invited by OpenAI to show people what their new model could do.

Social breakage also means being aware of, and trying to surface, the social challenges of AI and, to some extent, tapping into the AI spectacle to do so. This can be slippery and sometimes uncomfortable for me as an artist. Was “500,000 JPGs” critique of a certain kind of use of AI, or was it a demonstration of what AI is capable of doing?

Sometimes the answer is both. My hope with “500,000 JPGs” is that it showed limits of the tools but also possibilities. The difference is that I am also trying to raise the question of what the hell these possibilities actually mean. If we have the possibility to create 500,000 JPGs that sit on a hard drive, who cares?

I say this not to rehash the idea of the music video again, but to suggest a distinction between the forms of breakage. Using a fairly state of the art music generation model to produce a song that I actually want to listen to may, in some way, affirm the promise of AI as being a technical wonder. But using that same tool to frame a deeper question around the meaning of endlessly generated media is, I hope, a kind of Trojan Horse. Sometimes it lands, sometimes it doesn’t.

Not all of it is inherently inscribed with a message of resistance, either. The works shared above are visual experiments with varying degrees of deliberate critique embedded in them. It may be that not every work checks all of the boxes. That’s fine. I don’t want to make work with checkboxes. But the checklist sometimes determines the work I share and how I share it.

The video above is a snippet from a larger work in progress, where I am thinking about the metaphor of diffusion as it relates to actual human memory. This is a kind of logic embedded into these systems. If the machine is “learning,” then it is “remembering,” but what it “remembers” is a trace of an experience it has deliberately decayed. The sources that the training process deteriorates aren’t memories, but media materials, including home video footage and private photos and texts. That’s always disturbed me, and compelled me to want to think about that fuzzy, near-obliterated point of memory. So in “An Ablation / An Ablution,” there’s actually no generated video, though a kind of machine learning model was used to transform the home video footage into silhouettes.

Otherwise it’s just archival home video footage and noise and my own text, reflecting on AI and AI systems, but also deliberately not mentioning them, because honestly I am little sick of making work about artificial intelligence all the time. Memory, and the mediatization / digitization of memory, is much more interesting to think about. AI is tangential to that.

Aesthetic Breakage

The final form of breakage is the aesthetic break. To me, this is the idea of moving beyond the aesthetic constraints of AI. This is a fusion of technical and social breakage, because it means manipulating the technical system to assert a different kind of output than what was intended by its makers. For me, this aesthetic break is less rigid and more “risky” from an anti-AI perspective, because it affirms the possibilities of new forms of style from the machines.

The resistance in aesthetic breaks is more of a comment on the affordances of these systems. They are designed to compel us to certain imaginaries of images which rely heavily on references to photography, illustrations, and cinema. The aesthetic break is an attempt to say that AI systems, in fact, have a distinct lineage, one rooted in cybernetics, feedback, and the artist’s orientation — simultaneously within and beyond the system of concern. If these works look like video art, data glitches and so on, that’s because I am trying to acknowledge the artistic lineage of generative systems rather than pretend these models have a continuity with film, photography or illustration.

The aesthetic break is optimistic. It suggests that we can repurpose a system toward our own uses, rather than being constrained by the streamlined simplicity of corporate user interfaces to achieve corporate, job-replacing ends. The things I make will never put anyone out of work. I have always been an experimental digital artist, and this simply extends the use of these tools into that territory. In the sequence above, you see me taking deliberate glitches, feedback loops and other results of applied misuse of the system and treat the outcome as collage materials. Some of these include “successful” renderings of images — silhouettes, dancers, flowers, etc — which would otherwise have been, in the tradition of digital art and piracy, appropriated from some other source.

I think human artists have the right to sort through visual and sonic debris and transform it into something else, something radically different rather than derivative. I am not fond of corporate churn of human art. I don’t think it’s hypocritical to use this work in this way, but I am sure some would say it is. I think those people are the same “allies” who would have told me that my collage work wasn’t “real art” because I didn’t draw it by hand. But ultimately I am just dissatisfied with these AI outputs because it has such a corporate b-roll feel to it.

The aesthetics I explore emerge from a particular pairing of resistance: first, glitching systems, in order to find textures and patterns that the system is not oriented to give me; second, critiquing the logic of the systems that orient me toward certain visual outcomes in the first place. This is a logic of central tendencies, data patterns, prediction and control inherent to the structure of data-driven predictive analytics. It is also the social context of deployment: surveillance, data extraction, corporate hype, environmental damage (which I am hopelessly complicit with). It is also the logic of control, in the sense that users are offered limited off-ramps to explore the noisy peripheries of what the system can do.

I don’t want to be self-aggrandizing about my work, or to overstate the radical nature of the aesthetics I get as a result of it, or (god forbid) assert that it is the only way to make work with AI. It’s just a thing I do, to help me make sense of a thing I study, and I find it rewarding to think through these systems through these lenses. Making work the way I do helps me push through these relationships and to come up against the tensions between social, technical, and design constraints in a real-world practice.

If you’re keen to see more of this kind of thing, I’m looking to post more of it to my Instagram account, so maybe come check it out.