Sounds Like Music
Toward a Multi-Modal Media Theory of Gaussian Pop
This is a transcript from a seminar delivered on October 23 2024 at the Royal Melbourne Institute of Technology (RMIT) as part of a series of dialogues around This Hideous Replica, at RMIT Gallery. It was presented in association with ADM+S, Music Industry Research Collective, and Design and Sonic Practice. Thanks to Joel Stern for the invitation!
Generative AI and its corresponding media formats seem as if it is not tied to any kind of specificity. It’s one big churning of media into a single media format: text becomes images, images become sound, sound can be video. Such a capacity for translation between media forms is called “multimodal.” It may be tempting to suggest that artists like myself, working in the loose in-between of formats and genres, work the way Generative AI does. But as we'll see, a lot of strings are attached to Gen AI’s breakdown and reassembly of media. In particular, the translation for many of these — video, images, and audio, though not LLMs — depends on a diffusion process, the addition and subtraction of noise as a mechanism for steering production. This is a talk about AI-generated music, but I will talk about generative visual AI because AI-generated audio is images. It is, literally, an image of sound.
There's something about images and their referentiality that opens up a whole category of investigation. What is the reference? What is referenced in music is quite different from what's referenced in our understanding of images.
We've always had distinct categories of media theory about these things. We've never really had a multimodal media theory (focused on the common aspects of AI-generated media). And so this is what I'm throwing out there: what if AI-generated media is not media but a hypothetical media that, rather than serving as a reference to the world as media otherwise does, instead represents likely outcomes in a hypothetical population, which are constrained by an analysis of available data, an analysis filtered through infrastructures of media technology and power?
So, in other words, you start by generating noise. And if you think about what noise is, it's a lack of determination for any specific outcome, even technically, in the system. There's nowhere that is pre-selected in noise regarding what it could become. What shapes that noise is this infrastructure of media referentiality — what we have trained on, the politics of training on that data, obtaining that data, funding models, funding, GPU's funding infrastructure, and then power, right? Who gets that money? Who gets to decide what's collected, how it's collected, and how that noise is shaped?
I think it’s worth considering a multimodal media theory of generative AI because if it's all noise, everything can be reduced into noise and regenerated into anything else. There's so much possibility of potential in that. But many people, when they talk about AI, are just like — “Potential! The future is wide open! We can do anything!”
Except that... the noise is being constrained. That's the critical part. You're running the noise and putting it into a tight center: this reference to other things we've already gathered and made. This is just going through the tech. But everything I just said, you're correcting that noise. You're structuring things according to previous structures in music. I mean this quite literally but also philosophically, and these distinctions collapse.
Quite often, the structure of the song is determined by the structures of anything you've prompted. And so I say it doesn't have to be music. It just has to sound like music. That's an important thing. I've been talking about hypothetical images or hypothetical music: music that represents information about a dataset and pictures that are infographics. AI music is a sonification of data in the same way.
TODAY!
ASC Speaker Series
AI-Generated Art: Steering Through Noisy Channels
Online Live Stream with Audience Discussion:
Sunday, October 27 12-1:30 EDT
AI-produced images are now amongst the search results for famous artworks and historical events, introducing noise into the communication network of the Internet. What is a cybernetic understanding of generative artificial intelligence systems, such as diffusion models? How might cybernetics introduce a more appropriate set of metaphors than those proposed by Silicon Valley’s vision of “what humans do?”
Eryk Salvaggio and Mark Sullivan — two speakers versed in cybernetics and AI image generation systems — join ASC president Paul Pangaro to contrast the often inflexible “AI” view of the human, limited by pattern finding and constraint, with the cybernetic view of discovering and even inventing a relationship with the world. What is the potential of cybernetics in grappling with this “noise in the channel” of AI generated media?
This is a free online event with the American Society for Cybernetics.
November 1-3: Light Matter Film Festival, Alfred, NY
I’ll be in attendance for the Light Matter Film Festival for the North American screen premiere of Moth Glitch. Light Matter is “the world's first (?) international co-production dedicated to experimental film, video, and media art.” More info at:
November 1:
Watershed Pervasive Media Studio, Bristol, UK
It's an in-person AND online event, but I’ll be remote — and speaking on the role of noise in generative AI systems. Noise is required to make these systems work, but too much noise can make them unsteady. I’ll discuss how I work with the technical and cultural noise of AI to critique assumptions about "humanity" that have informed the way they operate. A nice event for discussion with an emphasis on conversation and questions.
Through Nov. 24: Exhibition: Poetics of Prompting, Eindhoven!
Poetics of Prompting brings together 21 artists and designers curated by The Hmm collective to explore the languages of the prompt from different perspectives and experiment in multiple ways with AI.
The exhibition features work by Morehshin Allahyari, Shumon Basar & Y7, Ren Loren Britton, Sarah Ciston, Mariana Fernández Mora, Radical Data, Yacht, Kira Xonorika, Kyle McDonald & Lauren McCarthy, Metahaven, Simone C Niquille, Sebastian Pardo & Riel Roch-Decter, Katarina Petrovic, Eryk Salvaggio, Sebastian Schmieg, Sasha Stiles, Paul Trillo, Richard Vijgen, Alan Warburton, and The Hmm & AIxDesign.
Through 2025: Exhibition
UMWELT, FMAV - Palazzo Santa Margherita
Curated by Marco Mancuso, the group exhibition UMWELT highlights how art and the artefacts of technoscience bring us closer to a deeper understanding of non-human expressions of intelligence, so that we can relate to them, make them part of a new collective environment and spread a renewed ecological ethic. In other words, it underlines how the anti-disciplinary relationship with the fields of design and philosophy sparks new kinds of relationship between human being and context, natural and artificial.
The artists who worked alongside the curator for their works to inhabit the Palazzo Santa Margherita exhibition spaces are: Forensic Architecture (The Nebelivka Hypothesis), Semiconductor (Through the AEgIS), James Bridle (Solar Panels (Radiolaria Series)), CROSSLUCID (The Way of Flowers), Anna Ridler (The Synthetic Iris Dataset), Entangled Others (Decohering Delineation), Robertina Šebjanič/Sofia Crespo/Feileacan McCormick (AquA(l)formings-Interweaving the Subaqueous) and Eryk Salvaggio (The Salt and the Women).
If you’re a regular reader of Cybernetic Forests, thank you!
You can help support the newsletter by upgrading to a paid subscription. If you’re already a subscriber, extra thanks! You can also help spread the word by sharing posts online with folks you think would be interested.