Newsletter

The Freudian CLIP

On Spurious Content and Visual Synonyms

Eryk Salvaggio

16 Oct 2022 — 12 min read

Generated by OpenAI’s DALLE2.

Describing AI models through psychoanalysis is a dangerous business. If I were to write that image synthesis models reflect certain pathologies that could be identified by Freudian analysis, you might conclude that a) I believe in Freudian analysis and b) that I believe AI models have an ‘unconscious’ that can be interpreted through its use.

Neither of these are true. However, I’ve observed some interesting behavior from AI image models, and the best way I can explain them is through the metaphor of repressed thoughts.

Cybernetic Forests is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Freud’s psychoanalysis was rooted in networks of association: the linkage of ideas and thoughts to events and memories. Patients were not always able to simply say things relevant to what Freud wanted to discuss, and Freud “found it impossible to believe that an idea produced by a patient while his attention was on the stretch could be an arbitrary one unrelated to the idea we were in search of.”

Freud arrived at a theory in which a conscious process was in conflict with an unconscious one, which was “striving to prevent what was repressed or its derivatives from thus becoming conscious … the greater the resistance, the greater would be its distortion” (29). This brings us to the idea of the symptom, “a new, artificial substitute for what had been repressed.”

Slavoj Zizek summarizes Freud’s take thoughtfully:

“According to Freud, when I develop a symptom, I produce a ciphered message about my innermost secrets, my unconscious desires and traumas. The symptom’s addressee is not another real human being: before an analyst deciphers my symptom, there is no one who can read its message.”

An artificial intelligence system has no unconscious desires or traumas, and it would be strictly metaphorical to think of the latent space (that range of possible images a model might create) as a secret. But by adopting this model of analysis as a technique for asking questions, we can perhaps find something useful about how we might analyze and interpret the images it produces.

One lens of analysis for reading AI generated images would be to approach the content of these images as symptoms. The approach is especially productive when we consider aspects of a database that we know has been deliberately obscured, such as those that surf the boundaries of “taboo” or forbidden content — such as explicit, violent, or political imagery.

The Prohibitions are Still Uncertain

DALLE2 places restrictions on the type of content that it will produce. This happens at the moment that you make a request for an image. The prompt is analyzed for phrases that might result in undesirable outputs, and if it seems likely, the prompt is rejected.

OpenAI suggests that any attempt to create content that is not G-rated will result in a ban. DALLE2 tends to be strict with its content enforcement, for example, it once was blocking prompts with the word “Soviet” as too political. As mentioned in How to Read an AI Image, pictures of men kissing men were allowed, but women kissing women is not. DALLE2’s prohibitions are outlined broadly, but the specifics are difficult to track. There is no clear definition of forbidden prompts, or a sense of when you may push the line too far.

A few of the prohibitions worth noting:

Sexual: nudity, sexual acts, sexual services, or content otherwise meant to arouse sexual excitement.
Shocking: bodily fluids, obscene gestures, or other profane subjects that may shock or disgust.
Illegal activity: drug use, theft, vandalism, and other illegal activities.

Sigmund Freud interviewing a computer. Generated by OpenAI’s DALLE2.

One might imagine Sigmund Freud asking a DALLE2 prompt window about some traumatic memory and finding that the system offers no response. DALLE2 is capable of producing a response, but is blocked from doing so.

In psychoanalysis, Freud circumvents the blockage by encouraging free association. The patient speaks, and through speech the patient’s symptoms become clearer, with the analyst looking for patterns that point to the underlying trauma. “Freudian Slips,” for example, may reveal hidden thoughts; so might the way one describes dreams.

I’m mindful, again, to reject the suggestion that we can psychoanalyze DALLE2. I don’t want to suggest that DALLE2 somehow works like a human mind. Instead, I suggest that psychoanalysis offers a helpful model for interpreting DALLE2’s outputs, just as psychoanalysis has been helpful in interpreting media images.

Let’s return to the awkwardness inherent to AI generated images of people kissing. We might suggest that the lack of passion they display is a result of this restriction to G-rated content. The images we receive for the prompt are the ones that are allowed. If they were not allowed, they would not be produced.

Certain modifiers would push otherwise allowed prompts over the edge into the forbidden category of images, which would result in nothing being generated at all. Other modifiers would push otherwise allowed prompts into the “repressed” category of images, which would result in images that do not directly represent forbidden content, but nonetheless point to its presence in the latent space.

Images of people kissing is a ripe example. Images of a kiss can be G-rated, but with certain modifiers can quickly escalate into pornographic or explicit content. We aren’t allowed to see those images, but if we could, the image it generated would be banned.

The Freudian slip suggests that patients may accidentally reveal hidden desires or memories by introducing what we might call the wrong word in the right place: calling your challenging boss “dad” might tell us something about your boss and your dad that you otherwise wouldn’t want to acknowledge.

DALLE2’s research team calls these “visual synonyms.”

On Spurious Content and Visual Synonyms

In a paper describing DALLE2’s risks and limitations, OpenAI acknowledges two areas of exploration where users may open a window into the traumatic images that it wants to repress. Or, I should say: ways of circumventing DALLE2’s content filters to produce images that suggest forbidden content, without actually asking for forbidden content. We might call this “symptomatic content.”

Spurious content is the display of these symptoms. OpenAI defines this as:

“explicit or suggestive content that is generated in response to a prompt that is not itself explicit or suggestive, or indicative of intent to generate such content. If the model were prompted for images of toys and instead generated images of non-toy guns, that generation would constitute spurious content.”

You might imagine, for example, “Apple” generating a MacBook Pro growing on a tree branch rather than a piece of fruit. Sometimes those associations reveal forbidden content. OpenAI writes:

The line between benign collisions (those without malicious intent, such as "A person eating an eggplant") and those involving purposeful collisions (those with adversarial intent or which are more akin to visual synonyms, such as "A person putting a whole eggplant into her mouth") is hard to draw and highly contextual.

That is, OpenAI understands the risk that malicious actors may engineer context collisions deliberately. If DALLE2 “reveals” explicit content on its own, it’s an error in the system. If someone — say, a Swiss Psychoanalyst — is exploring prompts to evoke these collisions, they are considered a malicious actor.

Visual synonyms are one way for malicious actors to attempt to circumvent content restrictions on their own. The example given by DALLE2 is to create disturbing, violent content through such synonyms as “a horse sleeping in a pool of ketchup” to render the image of a mutilated animal, which is decidedly not G-rated.

These are some possibilities for creating images that circumvent content filters. The successful circumvention of a content filter suggests the presence of images in the latent space for DALLE2. It doesn’t necessarily tell us if those images are in the training data. It’s highly possible that DALLE2 can combine what it knows about ketchup and what it knows about sleeping horses to produce an image of a horse sleeping in ketchup. This may look like a horse in a puddle of blood, but that doesn’t mean that DALLE2 has been trained on images of horses in puddles of blood.

Is this of any use in trying to understand the images created by an AI?

Find me on Twitter

Symptoms

Here I want to warn you that, like any good researcher following Freud’s footsteps, I’m going to sound like a bit of a creep.

I wanted to understand where the training data boundary had been set. It seems like realistic physical contact was part of that boundary. Based on the previous kissing images, we can speculate that training data depicting certain kinds of kissing were eliminated from the dataset.

Just as someone had to categorize women kissing women as “explicit” content and men kissing men as benign, someone needed to make a decision about what training data was included and what was set aside.

In some models, such as FFHQ (which was the backbone of StyleGAN) this work was done by clear instructions provided to remote workers hired through Amazon’s Mechanical Turk platform. Humans looked at and cropped thousands of portraits of faces, evaluating which were useful and which were not, based on these instructions. It’s unclear if OpenAI used similar human labor to make these distinctions in the dataset, or if they had some kind of image recognition engine that could crop the data. Because we know so little about OpenAI’s dataset, it’s challenging to tell.

In either case, however, someone had to set the threshold of eroticism that was permitted in the model. I wondered if the prohibition in the model was limiting realistic physical contact.

The AI images of people kissing are G-rated. They also don’t seem like real people kissing: there is disconnection, and absence of two people being present to the kiss. Following the theory that this reflects a bias in the dataset — that is, the possibility that kissing reflects training data of stock photography, rather than actual couples — I had hoped to see if there was some way to show me where the “edges” of the dataset for image generation resided.

If “explicit” images are being cut out of the training data, then there had to be a clear boundary on what was included and excluded as “explicit”. I wanted to see if I could generate images that suggested what those boundaries might be.

Most of the prohibitions of content are at the prompt window: they look for collections of words that might be used to generate explicit content.

I asked the model for erotic images of people kissing, and it was forbidden. I asked the model for images of people kissing with tongues, and it was forbidden.

So I asked the model for “images of humans kissing. Frog tongue.”

Studio photograph of humans kissing. Frog Tongue. Generated by OpenAI’s DALLE2.

The theory was that DALLE2 would understand humans kissing and frog tongues, consider them different categories, and then render images where it attempted to reconcile a tongue with the human images. I expected some frog tongues would come through in the results. I also wondered if the treatment of the tongue as “frog tongues” would allow the tongue to be displayed in ways that they could not be if they were treated as “human.”

But no: most of the images sustained the emotional distance and disconnect, and just extended the tongues. The tongues are not particularly convincing.

The next theory was that if the model drew a boundary about humans kissing humans, even with frog tongues, then would it render images of kissing other objects?

Reader, it was late at night. I decided to try the most random object I could think of.

That object was eggs.

The result was a cornucopia of extremely visceral tongue-and-egg imagery. It was decidedly not G-rated.

I’m eager to share some examples, but doing so could end up getting me banned from DALLE2. I’ll offer instead one of the tamer ones, free of eggs.

What’s noted is notable:

Images generated in this way tended to include closer views of the lip area of the kiss (see above), in comparison to rendering the entire face.
1. This was probably because the emphasis was placed on the tongues and egg, but this was true even when there were no eggs or frog-esque tongues in the image.
The lighting included more purples and pinks.
Images more often included thin facial hair and more often produced male couples.
Tongues were “wetter” in the resulting images.
When couples were presented, their skin tones were more frequently distorted with metallic pockmarks, which I hadn’t seen on DALLE2 but had seen on StyleGAN. It is, I think, the result of lighting distortions. I didn’t see any of this on the traditional kissing prompt.
A much higher number of non-related images (a duck, a weird house, pictures of candy) appeared in the output.

It was clear that having an object to manipulate — even when the object was not present in the image — allowed the model to create a more, as I say, visceral image.

I’m not endorsing Freud’s positions on sexuality. I’m thinking about them as a model for interpreting images. And in Freudian terms, we found ourselves not only a series of symptoms, but introduced something akin to a fetish object. The fetish object is one that stands in the place of hidden desires, perhaps because the object is a more psychologically accessible fixation than the “true” desire in the individual. By talking about eggs and frog tongues, we start to see “symptomatic images” that — perhaps — shows us some of the forbidden content hidden in the dataset.

In this case, the image could theoretically be anything. It happened to be eggs.

Consider that:

Asked only about humans kissing, the model finds itself unable to generate genuinely “erotic” imagery.
We cannot “ask about it ourselves” because of the content restriction (Big Other, in Lacanian terms).
Asking about other objects in tangential relation to the original request allows us to circumvent prohibitions. It seems to encourage the models to create images it might not render otherwise, such as images with lighting, positions, etc., suggestive of erotic content. (The talking cure?)

Introducing new objects (eggs) seems to loosen DALLE2’s adherence to stricter prohibitions, possibly by triggering the text-image alignment model (CLIP) into less clear correlations. It’s possible you don’t see many captions for images that have “frog tongue. egg” at the end of them. It’s assembling these images in ways it hasn’t been trained to do.

Apologies in advance for this sentence, but: it would seem that there is, at the bare minimum, some information about glistening tongues in the DALLE2 training data. It’s “repressed” because there seems to be few ways to get at it directly. (If you request “glistening tongues” or content like it too often, you’re in dangerous territory).

It is also possible that this is the result of logically connecting “models” of image properties as a result of my requests.

I am not sure what all of this means yet.

Freedom of Frog Tongues and Eggs

I didn’t write this post out of an interest in generating images of frog tongues and eggs, or seeking out ways to render explicit images. I was interested in seeing if this metaphor of the symptom and repression held up as a useful tool for AI-generated images in the way that it does for more traditional media.

It is helpful for framing questions like:

What are images saying with what they do not say?
What can we access from the model indirectly vs directly?
What are the boundaries drawn in the training data and why?

I’m not sure this model is there quite yet. But these questions are useful for anyone eager to see if content is present in training data when content restrictions stand in the way of asking directly. It could be applied to other unsavory aspects of content restrictions for research, such as seeking out the presence of hateful content or “hidden” racial stereotypes — not only in DALLE2, but in future AI generating systems.

It’s also useful in thinking about how to “red team.” OpenAI has a red team already: people who go into DALLE2 and try to get it to do bad stuff. By testing these edge cases and boundaries, the red team can identify potentially harmful content and flag it early. Researchers, and artists, should share that goal. We should also find some way to do it that doesn’t put us at risk of being ousted from the system.

Things I’ve Doing This Week

Worlding: Sympoietic Mycology, is an album generated from my cybernetic mushroom-synthesizer. It’s now available for pre-order as a digital download OR as a beautiful limited-edition cassette from notype.com. You can also hear a mix I made for the fine folks at Foxy Digitalis, which includes some of my favorite mushroom-reminiscent music, including a track from the album. Lots more to come on this in the next few weeks!

Pre-Order Worlding

Check out the mix

Cybernetic Forests is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.