Newsletter

Underutilized.

Generative AI is just Big Data 2.0.

Eryk Salvaggio

05 Nov 2023 — 15 min read

Upscaled Gaussian Noise Loop w/ “big data” prompt.

When you think of the history of generative AI, what comes to mind? It might be the early robotics labs of Marvin Minsky at MIT. Maybe you go further back, to Norbert Wiener and cybernetics. Personally, I think of something far less exciting: the origins of the insurance industry.

Today, what was once Edward Lloyd’s coffee shop is home to a small Sainsbury’s grocery store and a Pret-a-Manger. But in 1689, it was a hangout for sailors and those in the shipping trade. Patrons treated the shop like an office. They’d discuss work between sips, and Lloyd would listen. Eventually, Lloyd published what he’d heard in a small newsletter for the maritime industry; later, he would deploy observers to various ports who would observe the movement of ships. It was a means of offering news — and driving business for coffee sales. The paper crumbled, but was later resurrected by his sons as Lloyd’s List.

Over time, the data gathered about the maritime industry would balloon as retired ship captains offered expertise, and as auctions of boats took place on site. There was a special value in this data: businessmen and sea captains would place bets on which boats would return and which boats would fail. When 130 enslaved Africans were killed on a British ship to preserve fresh water for the crew, the result was a scandal — for the insurance claims. Nobody ever filed a suit on behalf of those who had been kidnapped and murdered. They placed bets, and they sued each other over how those bets should be settled.

Lloyd’s was the epicenter for the insurance of the slave trade — estimated to be between 80-90% of its business dealings. Insuring a vessel simply meant placing a bet on whether its human cargo survived the voyage. Various payments could be made, depending on how people died. Lloyd’s was a coffee shop, but it was also a hub of data surveillance and financial speculation where wealthy men could earn money on the death and exploitation of slaves over coffee.

These bets came to inform a peculiar and macabre kind of mathematics, a means of establishing risk. Past performances of captains and individual boats, the treacherousness of certain waters, were all factored in. In creating a relationship between past events, which were quantified and analyzed for patterns, Lloyd’s is widely cited as the birthplace of predictive analytics. Today, Lloyd’s of London continues to make its profit off of insurance, data, and risk.

Today, predictive analytics remains the basis of an economic ideology that underwrites much of our digital world, and generative AI is no exception. It is an industry built on an exploitation of bodies and labor at a distance. It has become so ingrained into our economies that we have forgotten what data is, and what the technology and financial industries do with it.

I would not compare the current regime of statistics to the destruction wrought by slavery. I only want to acknowledge that the origins and ideologies derived from that place — of distance from the real-world effects of abstraction. Generative AI owes more to this history of data analytics than to any history of AI. It is less about figuring out autonomous systems and more about automated pattern analysis. Those patterns strip away much of the world, and in part two of this essay — which comes next week — I’ll explore the ways that ideologies of AI reject the emotional meaning of the world.

But first I want to talk about Big Data and the insistence on displacing labor. Without the massive expansion of data, generative AI tools could not exist. The rebranding of data analytics to AI severs a historical narrative and perspective about where AI actually comes from, distorting the way we make sense of it and our perception of its risks.

Gathering data is an exercise of power. It starts by reducing the world, and people, to samples of behavior. Then it imposes rules, assigns categories, and limits or allows sets of actions. This emphasis on abstraction at the expense of living people reflects a deep disengagement from human joy or suffering. It has driven the tech industry to develop tools that empower that abstraction. It distances us from lived experience and connection, and leads people to believe in a kind of digital simulation of politics at the expense of local communities.

Online, the right to collect samples from our previously private lives started out as a casual coffeeshop kind of agreement. Social mediation services would extract value from our presence, and we could use the site for free. But meanwhile, San Francisco in the 2010s, scores of consultants made careers advising on data monetization and tapping the power of underutilized digital assets. This data wasn’t useful on its own, it had to be activated. The reanimation of this data was achieved through predictive analytics. The idea was simple: gather enough data, from our credit card purchases and grocery store cards, and companies would find useful patterns in that data. They’d use those patterns to figure out when to push sales, send us an email, or show us an ad on Instagram.

Just as data about the survival of ships would guide gamblers to certain bets, so it was believed that data could guide a vast number of decisions through sheer force of predictive rationality. At the height of this social-data fetishism, we believed that if statisticians could only average enough polls, they could predict elections. Facebook could look at communication patterns of its users to figure out who had a crush on someone and if it was reciprocated. Enough data could predict your sexuality, whether you were pregnant, whether your city was about to have a flu outbreak.

Through dopamine-inducing feedback loops, social media sites mined this data from us. It encouraged users to share information about themselves constantly, with smartphones reporting locations and even gamifying self-reports. Apps like FourSquare offered points for telling people where you were and how often you went there, information that was sold directly to online advertisers.

The social media model was simple: people provided content for free, and companies sold the ads. Writers were reduced to content, and reactions to that content created data that helped social media further analyze, predict, and target its users. At the heart of this practice was the view that writers and artists could be reduced to signals. The real value was mining people’s response to those signals. Today, companies are aiming to remove artists and writers from the loop entirely — it turns out, even free labor was too expensive.

Unparalleled Decisioning Power

It wasn’t just social media. On the backend of capitalism, the business to business (“B2B”) world sought our data too.

“Imagine knowing which elements of a multi-channel marketing campaign are effective and which aren’t, or automatically triggering personalized Web pages and offers based on a visitor’s clickstream and purchase history, or ... matching the right action to the right credit account at just the right time. You can, with MarketSmart.” (Cited by Golumbia, 2009).

In pure terabytes, the vast majority of archived human knowledge is sales receipts. By the end of the first decade of the 21st Century, data would be able to predict entire sentences, then books, that an author might write, or imagine a thousand museums full of a single painter’s work. In 2013, WalMart was gathering data about 20 million transactions per day. Cars collect data about the routes we take.

Generative AI might best be understood as a rebranding of Big Data and Predictive Analytics. The principles are the same as data analytics, and so are the underlying economics. Companies collect billions of data points, process them through massive data centers, and identify lucrative patterns.

Rather than the movement of ships or people, Big Data’s predictive analytics are tuned to our words and images.

Data as an Economy

Not content with the information posted across the walled gardens of Facebook or Twitter, generative AI companies claim dubious rights over images that were never offered to them. These images and words were shared by individuals to a range of platforms, part of that long forgotten agreement to trade our data for their services.

This data is no longer merely evaluated or presented to advertisers as pie charts or graphs. Words are generated based on their statistical likelihood to follow other words. We have images that emerge from random noise, based on central tendencies within the vast archives of our online visual culture. The 'Data Gold Rush' has made this kind of training data a new frontier for leveraging underutilized assets. It’s an important one, because advertising revenue on social media is slowing down and users are growing skeptical about using them at all.

Websites that once hosted large files for users to share with others (for the low, low cost of watching an ad) are leveraging its underutilized assets by selling or using this data for training AI. Getty Images, not content with Stability AI’s alleged scraping of millions of its images, trained on its own archives to build a Diffusion model that generates Getty-esque images. Where once our data was gathered to serve us ads, it is now gathered to serve us chatbots and clip art.

“But that’s not AI, that’s capitalism!” I hear you shout. Well yes. AI exists, and was built, under capitalist demands for profitability, not for public interest. The data analytics industry, and generative AI, is built on a long-standing regime of extracting wealth from information, and a reliance on cheap labor to maximize the information it has collected. Treating generative AI as a revolutionary new tool, or somehow independent of the system in which it originated, makes this connection to historical patterns harder to see, and harder to resist.

Generative AI offers a few twists. Many come in the form of chatbots that obscure what’s happening in the software by emphasizing a conversational interface. The ability to talk in real time, in simple language, to a machine is a noteworthy evolution.

But the idea that they are more “intelligent” than Google’s AdSense system is a deliberate misunderstanding.

Sure, they write words instead of identifying audiences. But they don’t actually provide answers to our questions. They present statistical extrapolations, a verbal equivalent of a machine betting on the survival of a ship that is out to sea. The questions we ask are the ship, and patterns associated with that ship’s past journeys through language are gathered through data analytics to predict likely outcomes. They steer themselves toward longer, authoritative responses.

The interface is designed to resemble a conversation, but it isn’t. It’s a summary of data most commonly associated with the words that come before a question mark.

The data used as the source of these analyses comes from the same places that all data analytics has always come from. Offering our data online was once voluntary, even if the question mark was silent. The most critical of us knew that going online was subjecting ourselves to a surveillance economy — tracked and measured (but the memes were free). The ghost of Edward Lloyd is still eavesdropping on our idle coffee house chatter, but now he’s placing bets on what we’ll say next.

What the walled gardens of social media networks collected — and sold to one another — was always served up under the illusion of anonymity. Even if we thought our data was identifiable (and it almost always is) we took comfort that no one would ever care to see it. It was one drop in the ocean of data surveillance. Orwellian as it was, mass surveillance suggested anonymity.

We were so used to that agreement of trading data for their services, that many of us forgot it was an agreement at all. The services have become deeply integrated with our lives: foolishly, I once committed to logging into my bank account with Facebook, a practice I’ve had to work to unwind. Today the terms have shifted, and the agreements we made seem poorly chosen.

Fake Decentralization

I remember when people liked Facebook and Twitter. Silicon Valley positioned itself as punk. It was rebelling against authorities and systems that stifled creative expression, limited participatory access to media, conspired to charge us more for pointless overhead. Then it won, and embodied all of those behaviors. The hybrid of hippie politics, techno-optimism and individualism that Richard Barbrook and Andy Cameron defined as the Californian Ideology had raised a new generation of children. For them, the dotcom boom that ended the first web boom was a bump in the road to the exponential fortunes of 2.0 tech companies.

Briefly, many of us living in San Francisco reaped the rewards everywhere we turned. It was the revival of disintermediation: demolishing pesky intermediaries that stood between ourselves and cheaper goods, services, and content. We had forums to share ideas, unmediated by the elite spellchecking of minimum wage copy editors at our local newspapers. People were annoyed with the concentration of power within the media, and social media belonged to us — we, the social!

Just like “social,” “democratization” took on a peculiar definition. An earlier generation might have learned from our efforts at “democratizing” Iraq and Afghanistan. But this wasn’t that generation, and it took everyone by surprise. In Silicon Valley, it meant undermining the overhead associated with a legacy business and shifting the cost to everyday people — the “social.” In 2015, an internal IBM memo summed it up well:

“Digital disruption has already happened! The world's largest taxi company owns no taxis (Uber), Largest accommodation provider owns no real estate (Airbnb), Largest phone companies own no telco infra (Skype, WeChat), World's most valuable retailer has no inventory (AliBaba), Most popular media owner creates no content (Facebook), Fastest growing banks have no actual money (SocietyOne), Largest movie house owns no cinemas (Netflix), Largest software vendors don't write the apps (Apple & Google).”

This physical, truly social world would be eroded by every virtual turn. Democratization would reduce overhead and costs against the institutional actors: books would be cheaper without heating a bookstore. The end of an era of “brick and mortar” was celebrated. But the arc of this democratization was toward denser concentrations of power, not less. The result has been further erosion of wages and value in almost every industry it has touched. The common connecting fabric of these services was a strange paradox: the belief that a centralized actor could facilitate a democratization of power.

The great disintermediaries became a new intermediary. It wasn’t a people’s revolution, it was a coup.

It isn’t a fluke of Web 2.0 or dot-com bubbles. In their book, Power & Progress, Daron Acemoglu and Simon Johnson point out that computation has long been associated with promises of productivity and economic prosperity. Yet, since the introduction of the “democratized” personal computer, the facts show us something else.

“Digital technologies became the graveyard of shared prosperity,” they write. “Wage growth slowed down, the labor share of national income declined sharply, and wage inequality surged starting around 1980” (255).

With computer companies relying on large companies for the bulk of their contracts, the companies reflected the concerns of those companies. For Facebook, and most companies of the data analytics era, any attempt at service was a front. The real business was data analytics.

If anyone embraced this radical disruption of malfunctioning institutions, Facebook’s algorithmic sorting would eventually punish thoughtful content that didn’t trigger profitable online arguments. Instagram would prioritize images containing blue skies or exposed skin at the expense of other images. Twitter would turn into X. Bandcamp would collapse. WeWork would start kicking people out of unprofitable office spaces after luring companies to give up their own real estate.

Everything was subsidized by Venture Capital with a thirst for data, redirected to ever-powerful sorting algorithms and marketing backends. Around 2018, a new breed of data grab started to appear: apps that would put funny lips or haircuts on your face, swap your gender, add wrinkles or remove them. Face-Swap apps didn’t just want to know where you were or where you were shopping. They wanted your face. Ideally, many pictures of your face. Cheap gimmicks of trick photography was how they would get it. Your raw images started to stockpile face recognition systems, used in surveillance — a kind of idle coffee-house chatter that would eventually be turned against minorities as “crime prediction” tools.

Then Big Data came to democratize art. We can expand IBM’s list: the most prolific producer of visual art has no artists (Stable Diffusion), and the biggest producer of words has no writers (OpenAI).

Monticello in California

Much of America’s technological lineage can be traced to Monticello, Thomas Jefferson’s sprawling estate maintained by people he had kidnapped and enslaved. Politically, Jefferson was constantly wringing his hands about the ethical quandary of slavery, and did abolish the international slave trade (while keeping people enslaved at home).

As a way to accommodate this hypocrisy of conscience, Jefferson relied on a set of ingenious technologies for mediation. One could come have dinner with Jefferson and discuss the moral quandary of slavery while being served wine from a technological apparatus, loaded by slaves in the basement, and delivered through a small elevator that ensured no one was disturbed by the slave’s presence.

Likewise, one would be served a meal through a rotating door, from which the food would magically appear on shelves, ready to be served as if by magic, with no need to acknowledge the service of the people kept hostage in the kitchen.

Technology has long been designed at the expense of those whose living is earned through service for the benefit of those who pay to be served. The interface is a tool of obscuring human labor behind screens. These screens enforce a kind of spectacle — they make food or goods appear on your doorstep, without any interaction with the chefs or the driver. They allow you to hire workers to process data for your startup for pennies on the dollar. These interfaces strike at the nerve of social connection, creating diffused networks where the individual elements of the underlying system are completely obscured, rendering cohesion and solidarity nearly impossible.

For those in power and control, the act of labor has a way of dehumanizing the laborers: they move from people to expenses. Many of the activities targeted for replacement are therefore not considered “human” at all, chiefly because technology encourages us to ignore our reliance on other humans. There seems to be almost a resentment of labor in the halls of technology: tech is never meant to empower workers to strike, it is meant to replace workers who someday might strike.

There was something easy about handing over access to our browsing habits and shoe size in exchange for being shown new websites and shoes, even as we knew it was all killing bookstores, record stores and community interaction. Once it killed the stores, it came for the music, then the books, then the art and photography.

This was shocking for many, because it was not only labor, but cultural memory. Photography is a technology of remembrance. It connects us to others. Sever that, and what have you got? The damage is emotional, not material. Yet, attachments to our imaginary, virtual worlds are so slippery, so mediated by spectacle, that our attachments to the world inside of screens feels dangerous to acknowledge.

“Love Doesn’t Scale.”

The Techno-Optimists seem to assure us, simultaneously, of the value and disposability of our online communities, our shared artworks, our online conversations. They want us to share, because sharing has always been to tool they’ve used to drive monetization.

AI adds another dubious layer to this relationship. It severs the community entirely. Instead, we talk to the AI, make art with the AI. What’s monetized is not technology as an intermediary: charging us a communication tax in the form of sampling our data and showing us ads is Web 2.0. AI is Web 3.0, a simulation of the web based on the ghost of interactions past. Social media was a platform where we were all trapped in digital walls and surveilled in order to build this next communication regime.

At my most cynical, my most paranoid, I find myself fretting about this trajectory. I don’t believe that the internet connected us to each other. I find it has isolated us from physical proximities to ideological ones. Now AI promises to further constrain those relationships, to move us from a time when one could speak and hear from many to a time when one can speak only to ourselves: one-to-none communication, a throwback to the days of yelling at the TV, but now the TV can adjust.

If the trend line of this historical trajectory of leveraging underutilized assets continues unabated, then the future seems bleak. Industry will aim to further tighten and constrain our interactions online until we are surrounded by engagement engines, pumping out material that keeps us typing and sharing. Based on the ideology of individualism, the tech industry seems likely to tighten the boundaries of these online systems so that we don’t need anyone else to do the things we love. We will type for the machine that surveils us, share with simulated audiences.

Denying our dependencies on others is not a means of amplifying human potential. It is a tool for rejecting that potential. I see this as the true existential risk of AI. Not the machines, which simply hum math into pixels. I am convinced that those who build the technology of generative AI will aim to replace, not empower, the communities and interactions we find ourselves valuing most today. Already generative AI is hijacking the impulse of empathy and conversation. If history is any indication, the next logical step of the system’s evolution would be to control it.

Things I Have Been Up To

The Wrong Biennale is a massive, online exhibition of digital art “pavilions”. I am excited to be a curator, along with the brilliant Nadia Piet and Kwan Suppaiboonsuk of AIxDesign, of the No-Camera Cinema pavilion featuring films from the AIxD Story&Code program. I was also selected to participate in the So Far, So Near pavilion curated by Laura Focarazzo. Do check them out!

The Wrong

Sarah Palin Forever at the
Clapham International Film Festival!

If you’re in London, the Clapham International Film Festival is showing a selection called “Machina: Artificial Intelligence Shorts & Talks” on November 10. One of my films, Sarah Palin Forever, is on the programme. It promises to be “a selection of films made using the latest AI technology exploring concepts not all of us are yet ready to embrace,” and the other inclusions look really interesting! Wish I could be there. If you go, let me know what you think!

Get Tickets

I had a great time talking to folks in-person and online at NYU’s Digital Theory Lab organized by Leif Weatherby two weeks ago. A few photos were shared by Tannon Reckling and Marina Hassapopoulou who are doing interesting work at the intersection of AI and media.

COMMUNICATION IN THE PRESENCE OF NOISE

You can go listen to it on Bandcamp! If you’re so inclined, you can also purchase a digital download or the CD copy (with a bonus Debord/Wiener glitter sticker). If you’re inclined, you can also watch the final video for House of Annetta (pictured) on the site.

watch