Newsletter

Nobody Can Save Us From Imagination

Let's Be Clear About What "AGI Safety" Means

Eryk Salvaggio

19 May 2024 — 11 min read

The result of walking OpenAI’s GPT-4 Omni model, step by step, through the actions of a diffusion model generating a human portrait.

Artificial General Intelligence is an advanced version of AI that “matches or surpasses human capabilities across a wide range of cognitive tasks.” Today’s computers are better than most humans at specific tasks: we have systems that can solve extremely powerful math problems through the automation of logic. We can sort through mountains of data in seconds that might take lifetimes for humans to analyze.

Artificial General Intelligence is the belief that, with enough data about the world, we could create a system that doesn’t just do narrow, focused tasks beyond the capacity of human beings. It looks toward a system that can do “general” tasks in ways that outpace human beings: that it will have such a mastery of the world that it would be able to find new solutions to problems, test those problems in simulations, and reveal the best course of action to humanity. This is what OpenAI calls “Super Intelligence,” a form of computation that they argue would radically shift human societies.

Artificial General Intelligence doesn’t currently exist. It might exist, someday, but we don’t know when. We are also so far away from realizing it that we still don’t know if it is possible. To assume that AGI will exist, you have to assume that the capacities of Large Language Models will continue to scale at exponential rates; you would have to assume that we can quantify enough information about a complex world into data-driven formats; and that there are enough material resources to build and sustain the computational infrastructure that drives it. You would also have to assume that Large Language Models are the key to unlocking a general intelligence: that if a computer can manipulate language, then it can understand language — that experience of the world and embodied understandings of the world — aren’t required to solve its challenges.

These things are far from settled. But at the moment, the price tag for AGI is $7 Trillion US dollars. If we want to set up a timeline, we could think about how long it will take for Sam Altman to accumulate that kind of capital. If Sam got $100 billion a year, it’d take 70 years for OpenAI’s AGI project to get started. That’d require the death of 35 Elon Musks, more if you assume OpenAI would have to pay taxes on the inheritance (Personally, I bet they’d get around it).

That’s a shocking sum of cash, and by contrast, Microsoft is spending $10 billion per quarter to develop more data centers — massive heat-producing buildings with a devastating impact on water supplies that are being built, for some reason, in Arizona. At this point, $10 billion every four months doesn’t sound like very much, does it? And that’s the game. That’s the Altman schtick. You promise infinite rewards, warn about apocalypse, name the largest number you can think of until massive sums seem almost normal. Then you cash the checks.

Humans are susceptible to believing in language when it is structured in ways that aim to convince us. That is the history of media and communication, from entertainment to political propaganda. If you can tell a story that people believe is true, they will become immersed in that story. Sometimes this calls on us to suspend disbelief: to say “I don’t believe this, but I am along for the ride.” Sometimes we believe the story. We say “This makes sense, I am going to vote for this politician.” In any case, the relationship between words and reality is tangential. Sometimes reality aligns to language, but mostly it just sort of does the job.

The story that ChatGPT “knows things” or “understands things” beyond an abstract sense dovetails nicely with a belief in AGI. ChatBots manipulate symbols into stories. If you believe that the ChatGPT bot is talking to you, you’re a prime candidate for believing that AGI is imminent. You have already surrendered to a particular idea of self-awareness in the system, a myth proliferated online and in “sparks of sentience” logic. Sentience is not required for AGI, but if you think the chatbot shows sparks of it, you’re also on your way to thinking super intelligence is within reach.

Artificial General Intelligence is also a story. It is a story told by people with power in the tech industry as a means of justifying their claim to that power. The “risk” of Artificial Intelligence is the same story, attached to a warning: that any shift in power away from the incumbent technology companies will result in chaos and destruction. It behooves us to be skeptical of that story.

A still image from guiding OpenAI’s GPT4-O through the diffusion process manually. It is not, I should note, how diffusion models work — at all.

The Existential Crisis

AGI risk storytelling involves speculation about when we will arrive at AGI, sidestepping conversations about whether we ever actually could. The pressure is that we must sort out AGI’s risks in advance, lest someone accidentally create it, and it takes over or destroys the world.

In niche philosophy departments, this may make sense to be concerned with — within a tight sense of overall perspective. Certainly, risks in the distant future do merit some time and energy. Containment of global warming, for example, requires us to think on longer time frames. But imagining a scenario with effects similar to unchecked global warming ought to merit fewer resources than solving the actual problems we know we have.

But the research on AI’s most improbable futures is not niche. It has all the signs of a moral panic — unique in that it doesn’t address a human deviant, but the future itself. I am not optimistic about the future AI lays out for us, but it’s worth thinking about this particular focus on abstract risk as a means to mobilize resources against a mysterious future enemy that threatens literally everything we value.

It is now a mainstream idea, and it is shifting the ground from thought experiment to urgent necessity, steering policy conversations, and shifting the frames through which we understand and define the risk of AI. Institutes focused on AGI risk, such as The Future of Life Institute, have been granted at least $660 million from crypto billionaires to worry about imaginary problems. Sam Altman, who heads OpenAI, has been asking for $7 trillion to develop AGI “safely.”

“AGI safety” is a coded buzzword. It sits outside of “AI ethics” or “AI justice” or “Responsible AI,” all of which have their own flavors, histories, and perspectives. AGI Safety signifies a focus on long-term risks. It says that you are not concerned with immediate problems unless they connect to scale. Your interest is in the long-term effects of imaginary, future systems. The AI / AGI Safety movement is always bundling these problems with both engineering and policy-focused solutions that displace concerns of the present to focus on hypothetical issues that might someday emerge.

At the heart of the argument of OpenAI is that nobody else will build AGI safely. That case has become complicated. This week saw two departures from OpenAI’s safety team, which was focused on what OpenAI calls Super Alignment, and the subsequent closure of the team. The work of Super Alignment was largely hypothetical. AGI systems don’t exist, so the teams were speculating on what risks might emerge from a system that existed solely in their imaginations, and then engineering technologies to mitigate that risk.

Make no mistake: OpenAI’s safety team did not spend any time considering current and present social or environmental dangers raised by existing artificial intelligence systems, including those of their own design. If OpenAI was a sincere organization, steering its own resources to building ethical AI systems, they would have had incredible tools at their disposal. Instead this team focused solely on one goal: how to control a hypothetical “rogue superintelligence.”

OpenAI’s much-lauded AI safety team was headed by Ilya “it may be that today's large neural networks are slightly conscious” Sutskever, the chief scientist at OpenAI who recently left after doubting his own company’s commitment to preventing hypothetical futures. His team was granted 20% of the company’s computational resources. That was not enough for Jan Leike, one of the co-leads of the team who resigned last week, ahead of news that the entire team had been disbanded - lack of compute time was cited as one of the reasons he didn’t believe OpenAI was serious enough about this goal. (Update, May 21 2024: Apparently, they never got that 20% compute.)

Some of this is relatable on a human scale. If you sincerely believe that rogue AI is an existential threat, of course it is your overriding concern. If you dedicate your life to that anxiety, only to then create one of the biggest breakthroughs in computing history, of course you confront those fears head on. Your wildest dreams just came to fruition — so of course, you worry that the other shoe will drop. That anxiety is something I can understand. But it doesn’t mean it’s realistic.

To be clear — and fair — there were other teams, too, including the AI preparedness team — you can look at OpenAI’s organizational system around safety here. The risk matrix they were working to mitigate tackled challenges of cybersecurity; chemical, biological, radiological, nuclear threats (“CBRN”), persuasion (an AGI’s ability to convince us and others to do bad things), and model autonomy (a model’s ability to do bad things on its own).

When I say the AI Super Alignment team had no impact on the direction of ethics at OpenAI, I am not saying that to be mean spirited. I am saying it because it was, by definition, an attempt to figure out tools for regulating systems that don’t exist yet.

Another still from manually navigating OpenAI’s ChatGPT-4o through a diffusion process. It is not how diffusion works.

Super Alignment: A Performance Review

Assume the best of this camp, for a moment. We can assume the belief in AI risk was sincere. If that was the case, the way they went about solving it is more symptomatic of the problems with AI than an indication of how problems might be solved.

For a problem that would encompass the entire planet, you would think diversity of thought, perspective and experience would be essential. How do you solve a dense, interconnected problem of a non-existent complex system by only looking at source code?

Yet, the team consisted primarily of white male computer scientists from Europe and the US who insisted that they could sort out these problems for the rest of the world — and do it without asking anyone else. To be on the team, OpenAI only hired Machine Learning (ML) researchers and computer engineers. There were no stakeholders from impacted communities, no sociologists, human rights researchers, political scientists, and certainly no artists. There were not even information security experts. It was entirely ML focused. And it self selected: skeptics of AGI and AGI risk would have little interest in spending their lives and talents on a problem they didn’t think was real.

The fruits of that effort was revealed a few months ago: Super Alignment. Super Alignment is the idea that you could have smaller, less intelligent AI systems regulate larger, more intelligent AI systems. It sounds smart, if you buy into the story. It falls apart under any degree of scrutiny. If LLMs can be confused by simple human-issued commands, then they can easily be confused by a “super intelligence.” Secondly, in what other way would humans ever control a computer system other than through programs that interface with it?

It was very much an engineer’s solution to an engineer-made problem, because it emphasized an expansive, complex system, coordinated toward containment of a larger complex system, in ways which are impenetrable to human decision-making. It heralds back to a need for “human centered design” in that it reproduces the problems of early complex control systems — rows of incomprehensible indicators and switches — but then automated the switches and removed the lights.

The solution was to build smaller, targeted systems that would regulate what the larger system did, preventing its outputs from containing anything that challenged its existential-risk-focused threat matrix.

The solution emphasized automated bureaucracy without human legibility: a new black box placed on top of other black boxes. I might call them “Black Locks” around the Black Box: a series of tightly connected, incomprehensible logics regulating each other that humans cannot penetrate. This was the solution at the heart of OpenAI’s future.

Which is fine, and is pretty much as far as this thought exercise could currently go. Because, to reiterate, the systems these Black Locks would lock haven’t been invented yet. It’s literally a solution for a problem that doesn’t exist. I don’t know what more could have been done with it, because they certainly weren’t interested in shifting to immediate concerns.

An H-Bomb in Every Pot

This week, two of the people leading that team (Ilya and Jan) resigned from OpenAI — and on Friday, the team was shut down by OpenAI altogether. There has been some hand-wringing that suggests this reflects OpenAI’s diminishing commitment to the ethical use of its technology. But we shouldn’t be fooled: OpenAI’s super alignment team had zero interest or impact on the ethical use of its technology in its current iteration. It was a forecasting operation, designed to engineer solutions to imaginary problems — to find ways to intervene into systems that nobody yet knew how to build.

This is not to suggest that OpenAI is going to be a more ethical company without them. It’s going to be the exact same company without them: a company that works almost exclusively on deepfake technologies while lobbying Congress to demand access to exponential sums of human data to build a thing that they claim might kill all of us.

The work of the Super Alignment team was so unimportant to anyone’s actual life, now or for the next 35 lives and deaths of Elon Musk, that it makes no impact at all. AI’s greatest risk is not an AI spill or sentience leak. To that end, there are teams of people working at OpenAI who are doing ethics research and thinking through problems of the present. I am not saying they’re going to solve those problems, either. I’m just emphasizing that the people who have left OpenAI weren’t those.

I would argue that even the AI ethics teams of OpenAI are unable to tackle the problems their products have unleashed. They do good enough work with technical band-aids — they fake diverse prompts in image generation, they’re highly restrictive on the kind of output the models produce. Judged solely on the outputs of the model, OpenAI is not doing that bad.

The problem is that there are risks well beyond the model’s outputs that require real thinking to solve, thinking that could never be done within the company’s walls. Those risks include the consolidation of wealth and power that AI affords to companies like OpenAI, Google, Meta, Amazon and Microsoft — companies whose control of data has handed them unprecedented power to organize, sort, define and shape the world.

The departure of the super alignment team does nothing to advance or challenge that power. If anything, it suggests that OpenAI is on its way to becoming a normal company. If it moves away from this demand — essentially, “shape policy to our whims or we might accidentally destroy the planet” — then we may actually be able to have rational policy conversations about big tech, focused on this concentration of power and how to constrain its abuses by those who control it. Regulators need to be bringing those outside the walls into dialogue in order to understand those harms, rather than becoming enamored with the complex science fictions of super alignment.

Altman is a known type. He is a salesman, he wants to make money, he wants to control the market, and he sees himself transforming the world. Thanks to a long history of resisting techno-corporate power, we know how to deal with corporate overreach. We don’t need Super Alignment, we need politics: open, participatory systems of airing and resolving grievances.

OpenAI’s weird position is akin to building a commercialized H-bomb: they frame it simultaneously as a powerful tool that could destroy the planet, but also want everyone in the world to have access to it. So their reliance on automated decision-making to curb the risks of automated-decision making, and its marketing gimmick that it is the only company safe enough to commercialize these equivalent H-bombs, fit into Altman’s strategy. It fits into the strategy of Silicon Valley as it has long been: replace politics with computation, preserve the status quo through data.

Most real harms of AI are obscured by hype. It shapes what we talk about and structures the resources poured into solving problems — whether we fund teams to solve problems that don’t exist and likely never will, or fund solutions to problems we already have and know how to solve.