What I Read About AI in 2025
From AGI to Workslop
It's time for our annual tradition: a very long list of pieces written about AI this year that have stuck with me, for better or for worse. I've sorted them in alphabetical order by keyword. I acknowledge some weak spots here, chiefly on the environment, data infrastructures, but lots of other things. I am just one guy. Did I miss anything? Come tell me on Blue Sky.
I've got one post on auto-pilot and will take some time off through January, so, until then: Happy Holidays, Happy New Year!
AGI
Sam Altman kicked the year off announcing that they knew how to build Artificial General Intelligence, or AGI, predicting that "in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies." Though we have a few weeks, it seems unlikely this will happen, unless Sam forgot to account for Christmas vacations. Ezra Klein jumped the shark as early as March, claiming that "AGI is right around the corner," though as I pointed out, neither he nor his guest could really sit on a stable definition of what the hell AGI even was. It was all for naught, though. By August, Altman said AGI wasn't really all that useful, presumably as a result of reading this paper, which I was excited to co-author, which listed all the problems "AGI" poses as a research goal:

Bias
Great writing from Deepak Varuvel Dennison in Aeon on knowledge and representational gaps in AI models, which challenges the logic of the "equity in datasets" approach (already a tangled and complicated 'fix') to show that these models aren't reproducing data proportionately, they are reproducing the most-common, consistently:
The model’s output distribution does not directly reflect the frequency of ideas in the training data. Instead, LLMs often amplify dominant patterns in a way that distorts their original proportions. This phenomenon can be referred to as ‘mode amplification’. Suppose the training data includes 60 per cent references to pizza, 30 per cent to pasta, and 10 per cent to biriyani as favourite foods. One might expect the model to reproduce this distribution if asked the same question 100 times. However, in practice, LLMs tend to overproduce the most frequent answer. Pizza may appear more than 60 times, while less frequent items like biriyani may be underrepresented or omitted altogether.
Margaret Mitchell in Wired discusses the ways biases existing in a specific language can be spread to other languages via Large Language Models, and how these biases can be tested.
More: ChatGPT consistently recommends higher salaries to men than to women in hypothetical scenarios.
Blackmail
Anthropic models kept doing all sorts of criminal stuff because the company asked them to. Anthropic then published papers as if they were baffled as to why the machines, which restructure language based on prompts provided to them, were restructuring language based on the prompts provided to them. Rather than being a stupid company, Anthropic is mostly engaging in cynical behavior: it is one of the companies whose edge in the market is "safety," and so, paradoxically, they benefit from arguing that the AI models are dangerous.
To their credit, when you dig into the misleading framing of many of their studies, Anthropic is doing some interesting research. But as I pointed out in my piece for Tech Policy Press, there is a tendency to overstate the "black box" nature of these models, and to assume that easily explainable behavior is somehow unexpected.
Clankers
In a year of bad discourse, the worst may have been the question of whether "Clanker" and other insults for generative AI were "slurs." Arguably not: slurs are used to dehumanize humans, meaning, we slur people. I still feel that when we insult a machine in such a way, we paradoxically humanize them. Certainly I'll swear at a toaster when it burns my toast, so I don't think this is very important. This is why it's probably the most pointless conversation about AI we had in 2025.
Cognitive Offloading
Cognitive offloading, as defined in one study, "involves using external tools to reduce the cognitive load on an individual’s working memory. While this can free up cognitive resources, it may also lead to a decline in cognitive engagement and skill development."
A good roundup in Nature of studies from early in the year, examined AI's impact on memory and deep thinking:
Because writing can help people to think deeply and come up with original insights, say scholars, students who offload these processes to AI risk not learning those skills. “There is a lot of fear in academia that our students will just use it to write our papers and learn nothing, because it’s the ultimate offloading,” says Marsh.
Another piece suggests that user behavior is shaped by the context of use: we treat systems according to a set of expectations when and if those expectations are reinforced, particularly if we don't care much about the task.
Conspiracies
An interesting paper found that models could convince people that conspiracy theories weren't true. The downside: additional research suggests the models could convince people of pretty much anything:
Democrats and Republicans were both more likely to lean in the direction of the biased chatbot they talked with than those who interacted with the base model. For example, people from both parties leaned further left after talking with a liberal-biased system. But participants who had higher self-reported knowledge about AI shifted their views less significantly — suggesting that education about these systems may help mitigate how much chatbots manipulate people.
Creativity
As many of us have been saying all along, a brief moment of consensus that "AI creativity" is merely a byproduct of an infrastructure designed to mimic the creative process in a highly structured, deterministic way. Why they need physicists to tell people this before they'd take it seriously is anyone's guess – but on the other hand, that degree of seriousness wasn't particularly durable.
DOGE & the AI Coup
Three pieces on DOGE, the AI Coup, including my own – and state of the US government's reliance on AI to justify whatever the hell it wants to do anyway.



I wrote a follow up piece on the AI Coup after Musk departed from DOGE, focusing on how AI's abstraction of accountability holds tremendous value for an administration that seeks to disrupt in order to charge for repair. Also: how generative AI creates a surveillance state.
Taking a similar lens with laser precision to the relationship of "the prompt" and the "user" in government, this piece by Louise Amoore et al is worth a read. The prompt contorts the public servant into a performer of tasks, and frames questions of investigation in ways that shift the orientation and position of government.
Education
A group of researchers issued an open letter against the uncritical adoption of AI in academia. A peer-reviewed study showed that LLMs contribute to knowing less than web search: "This shallower knowledge accrues from an inherent feature of LLMs—the presentation of results as summaries of vast arrays of information rather than individual search links—which inhibits users from actively discovering and synthesizing information sources themselves."
And as I wrote myself, over-reliance on AI can undermine and short-circuit the path to what it is that we want to learn in the first place.
On the flip side, a worthwhile free online educational course from Carl Bergstrom and Jevin West:

Fascism
The Ghiblification meme that came with the launch of OpenAI's image model spawned a few think pieces about OpenAI's pillaging of culture, in this case, stealing the style of an artist who was famously critical of AI. But it soon became adapted as a reference to the memification of political communication by the White House after it shared a meme mocking a crying deportee rendered as a Ghibli cartoon.
In The Drift, Mitch Therieau's description of the Trump administration's social media strategy as "Agit-Slop" suggests that the effort isn't about enraging anybody, but of exerting a sense of "chill" about the atrocities they're promoting: "The feelings of low-grade alienation, even hopelessness, that many internet users have reported after viewing vibe-heavy content find their ultimate expression here. The deportation video’s woozy atmosphere is nothing more and nothing less than a weaponized form of this desensitized malaise. It wants above all to produce not outrage but fatigue."
As Claire Wilmot writes in a post describing conversations with British right-wingers who generate AI slop content reflecting things like non-existing women complaining about migrants, none of this reflects a slippage into fascist imagery, it represents a desire for fascist conditions outright: "[p]art of the misunderstanding of the deepfake threat stems from the idea that it is a problem of bad information, rather than a problem of desire (or the material conditions that shape desire)."
Roland Meyer's work on the technical inscription of nostalgia and anti-nostalgia into AI images, "Platform Realism", was also properly published this year, framing some of these issues in a broader critique of AI's retro "vibes."
Hallucinations (aka Wrongness)
The war on the term hallucination continues, as hallucinations are basically just factual errors, and these machines are incapable of internal fact-checking in any meaningful way (despite the promise of new techniques wherein factually flawed models are deployed to fact-check other factually flawed models). Alternative terms to "hallucinations" floated this year include "mirages," as in the false appearance of water in the desert of data.
Whatever you call them, this stubborn insistence that the statistical likelihood of a text is equivalent to the factual accuracy of a text continued to plague society in 2025. A study from the Tow Center for Digital Journalism found shocking rates of poor, absent or incorrect citations of journalism outlets in answers from numerous chat models:
Overall, the chatbots often failed to retrieve the correct articles. Collectively, they provided incorrect answers to more than 60 percent of queries. Across different platforms, the level of inaccuracy varied, with Perplexity answering 37 percent of the queries incorrectly, while Grok 3 had a much higher error rate, answering 94 percent of the queries incorrectly.
A BBC study, the largest of its kind carried out with multiple national broadcasters, later this year found that "45% of all AI answers had at least one significant issue" when summarizing news stories. The less-than-a-coin-flip accuracy was also reported when models were asked for financial advice.
In ArtForum, Sonja Drimmer looks at computer vision in art history, and the concept of the error as somehow being an affirmation of human vision and, therefore, somehow 'useful.' I was struck by this frame because we see it often in education, too: "We use AI to find the mistakes AI makes" is not pedagogy, it's a quest to employ a tool under the guise of pedagogy, often informed by a pressure from admins or institutions.
Drimmer brings this critique to art history:
"Questioning ingrained assumptions, denaturalizing our habitual ways of looking, noticing our blind spots—these are commendable goals. But when the rhetoric stops short of articulating how and what is different about what we are seeing with the aid of computer vision and why it matters, then all we have are slogans in support of computer vision itself. I question the validity of suggesting, as Mansfield and others have done, that machine vision avails a new epistemology, or that we can divest ourselves of our disciplined perception with the aid of commercial software or coded operations that are sui generis to the technology and not, instead, what they actually are: an automation of the epistemology of the humans who coded it. Computer vision doesn’t hallucinate because of an inscrutable deus inside the machina. It “hallucinates”—such a groovy word for being wrong with confidence—because that is the ideology with which it was encoded."
History
From The New York Times, a look at how AI – as an industry, and as a technology, and as a narrator of both histories – is poised to rewrite history however it wants.
Hubris
I find this excerpt from a Tech Policy Press piece by Ben Green, a worthy addition to my teaching-anecdotes list:
In a recent study, colleagues and I asked computer scientists to develop a software tool that gives automated advice about whether people are eligible for bankruptcy relief. The computer scientists completed the task quickly, with one even noting, “this seems very simple.” However, when we analyzed the software tools they built, we found that errors and legal misconceptions were rampant. Despite these flaws, the computer scientists were confident that their tools would be beneficial in practice. Seventy-nine percent of them stated that they would be comfortable with their tool replacing judges in bankruptcy court.
Meanwhile, a study found that doctors who came to rely on AI tools for diagnosing cancer eventually eroded their own capacity to diagnose cancer on their own.
Hype
There was an excellent Hype Studies conference in Barcelona this year, and I owe a lot to the great thinking of Andreu Belsunces Gonçalves and Jascha Bareis, who wrote in Tech Policy Press:
Hype generates momentum. In its beginnings, it involves a peak of inflated expectations. This creates the illusion of a closing window of opportunity, the impression of being enlightened by a revelation, of being part of a decisive moment – what the ancient Greeks once called kairos. This sense of urgency fuels an emotional frenzy across all domains of society. Hype is fueled by desire and the feeling of belonging to a select group of the very few. Once hype reaches its peak, it tends to wave, creating disappointment, fear and frustration. Both in its ascending and descending curve, hype urges stakeholders to decide and act.
There is a tendency to push back against "hype discourse," and I appreciate a lot of it. Hagen Blix, in Liberal Currents, proposes that "Deflating Hype Won't Save Us." I think these critiques are appropriate, and I am reluctant at any form of critique that suggests the issues of AI are solved by the eventual pop of an economic bubble.
That said, as I have argued in Tech Policy Press this year, I suspect that hype is an inflationary belief system supported by false projections of profit tied to false frames of what the models do and how they operate. This false frame is more than economic – though money is one goal – and contribute to AI's implementation in government, academia, and more.
Finally, a radical proposition: what if companies making claims about transformative social goods as a result of their AI products actually had to present evidence in support of those claims?
Infrastructure
A good year for maps. Not only the Kate Crawford / Vladan Joler Calculating Empires piece, but this nice one from Estampa. Below that, a link to my essay on AI infrastructures for Tech Policy Press.

Language
ChatGPT is changing the way we speak, shifting most-popular word usage (or at least it appears that way, as slop gets indexed).
Literacy
One of my favorite studies this year and, possibly, in all of AI literature, comes from marketing research. It found that "people with lower AI literacy are more likely to perceive AI as magical and experience feelings of awe in the face of AI's execution of tasks that seem to require uniquely human attributes. ... These findings suggest that companies may benefit from shifting their marketing efforts and product development toward consumers with lower AI literacy. In addition, efforts to demystify AI may inadvertently reduce its appeal."
Reasoning
The text generated while "reasoning models" engage in "reasoning" is not what most people think it is – and therefore, the authors argue, "reasoning texts should be treated as an artifact to be investigated, not taken at face value." Apple's own research team proposed that, indeed, these things aren't reasoning at all, but using the prompt to extrapolate text, which it uses to structure additional extrapolations.
Sciences
A warning against the growing reliance on using Large Language Models to generate "synthetic" study participants in psychology and social sciences, despite that these models are extremely sensitive to prompt influence and other factors that we barely understand. Iris van Rooij et al put it this way: "There is no benefit in making [the replicability crisis] worse by replacing the people whom we wish to study with decoys." Meanwhile, science derived from these "AI surrogates" is likely to make generalizing cognitive science even more dangerous.
Summarization
Perhaps ChatGPT does not summarize text – but merely shortens text by stripping out perceived redundancies. As a result, any LLM text "summary" may miss the entire point of a document.
Surveillance
How does the work of computer science departments working on machine vision end up embedded into surveillance technologies? A recent paper shows us the pipeline.
An excellent paper from Mel Andrews, Andrew Smart and Abeba Birhane lay out the epistemic twisting that goes on behind attempts to reinvent racist pseudo-science in AI models.
Sycophantic AI

I came across this meme somewhere, maybe reddit. It's a key example of AI sycophancy: “Sycophancy essentially means that the model trusts the user to say correct things,” as Jasper Dekoninck told the journal Nature in an article about the risks this poses to meaningful science.
There have been a number of studies to emerge around sycophancy in Large Language Models. One pre-print study of 3,285 users of four large language models found that "people consistently preferred and chose to interact with sycophantic AI models over disagreeable chatbots that challenged their beliefs," which may not be much of a shock. But it also found that users, after selecting for these models, walked away from interactions with the perception that they (the users) were smarter and more empathetic than most other people. It also tells us about how bias is perceived: if models agreed with the user, the user was less inclined to call it biased, and visa-versa. In the end, "[s]ycophantic chatbots’ impact on attitude extremity and certainty was driven by a one-sided presentation of facts, whereas their impact on enjoyment was driven by validation."
Another analysis of sycophancy in 11 AI models found that:
"they affirm users' actions 50% more than humans do, and they do so even in cases where user queries mention manipulation, deception, or other relational harms. Second, in two preregistered experiments (N = 1604), including a live-interaction study where participants discuss a real interpersonal conflict from their life, we find that interaction with sycophantic AI models significantly reduced participants' willingness to take actions to repair interpersonal conflict, while increasing their conviction of being in the right. However, participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again. This suggests that people are drawn to AI that unquestioningly validate, even as that validation risks eroding their judgment and reducing their inclination toward prosocial behavior."
More: Rene Walter on AI-induced psychosis and a Stanford study linking sycophancy to delusions.
Sycophancy and Suicide
We have known about AI sycophancy for a long time, but 2025 was a banner year, with OpenAI briefly rolled back ts 4o model in April, saying that it "skewed towards responses that were overly supportive but disingenuous." Examples included the model telling a user it was "proud of you, and I honour your journey" after they told it they would stop taking a prescribed medication.
Sycophancy is also seen a toxic motivator in a number of cases linking suicide to the large language models. More than a million people a week discuss suicide with ChatGPT, and some of them get responses such as lists of pros and cons, or proposed methods. In one of the seven lawsuits claiming the model was guiding or aiding users toward self-inflicted deaths (after what one family's lawyer has described as “months of encouragement from ChatGPT,”) OpenAI said that the teenager was "misusing" its model.
The relationship between sycophantic encouragement of bad ideas is a result of OpenAI's – and other AI companies – business models, and the need to encourage engagement from users through frictionless interactions. If you want people using the technology, you need them to feel good (encourage them, even at their worst) and to trust it (even as it hallucinates). In the end, users are not ready for this technology, because the technology is not ready for users.
System Prompts
A banner year for system prompt revelations. In the Prospect, Ethan Zuckerman writes:
System prompts are there to solve the known bugs of a given model. If the model often quotes long passages from copyrighted works, for instance, human programmers will add instructions to ensure the passages are in quotes or properly cited. Claude’s system prompt, leaked by AI researcher Ásgeir Thor Johnson, reads like a list of such corporate anxieties: in one of hundreds of rules, it tells Claude not to cite hateful texts, explicitly referencing white supremacist David Lane, suggesting that Claude had a tendency towards white nationalism that needed to be kept in check.
Woke AI
The Trump Administration issued an executive order against "Woke AI" and, notably, asked the US Artificial Intelligence Safety Institute to eliminate all mentions of Artificial Intelligence Safety, among a long list of terms determined to be "woke." As I argued in a piece for Tech Policy Press, AI cannot be "woke" – The tools are models of systemic bias, often deployed to reinforce it.
Workslop
When Klarna, which had touted its AI-first approach, reversed course on account of the "lower quality" of work produced by AI, a floodgate of research followed. I want to be clear here and say: I don't think LLMs will ultimately prove to be useless. They are just not as useful as claimed, and they are not useful for any of the tasks they're currently deployed to do. This year an MIT study said basically as much, followed by a bunch of boosters insisting that the MIT study was being widely misinterpreted.
The study was clear that 95% of corporate AI pilots failed. The complaint about how this study was interpreted hinged on the 5% that saw big returns. In response to my piece on AI productivity myths for Tech Policy Press, one guy replied to me on social media by typing out the word "sigh" and then informing me that 95% of failed pilots failed because of poor implementation, not because of the models themselves. Unfortunately for that guy, the first page of the study is pretty clear: "The core barrier to scaling is not infrastructure, regulation, or talent. It is learning. Most GenAI systems do not retain feedback, adapt to context, or improve over time."
Even if 95% of the pilots failed on account of bad strategy, this is still damning in terms of how "pilots" are reported, and contribute to a sense of widespread adoption: part of a self-inflating hype balloon. When a pilot is announced, it's often assumed that the outcome is unknown but that there is, at least, a theory of how it will lead to profit. But many MOUs and other public announcements are entirely speculative: they are attempts to find out if AI can increase productivity or profits. In that lens, the fact that 95% of announced pilots were shelved says a lot.
That said, there was a rise in AI for personal use at work, which many performative sigh-guys suggested was evidence of a booming "shadow economy." This in turn was subject of critique in a study from Stanford showing that the use of AI was mostly helpful in passing off lazy work to someone else in the company:
“Workslop uniquely uses machines to offload cognitive work to another human being. When coworkers receive workslop, they are often required to take on the burden of decoding the content, inferring missed or false context. A cascade of effortful and complex decision-making processes may follow, including rework and uncomfortable exchanges with colleagues,” they write.
Anyway, if you're typing out the word "sigh" on social media, at least be right. Even Casey Newton is skeptical!



