Friday, September 27, 2024

Extinction risk from AI

 Link from Twitter conversation

Preamble

I don't just care about extinction, I also care about societal collapse. Distinguish catastrophic risk from extinction risk. Helen Toner suggested the worst case scenario would be extinction, but "everyone but 1 million people dies, and we go back to the bronze age" is still really bad and worth regulating against. But as requested, I'll try and focus on extinction here.

Link to generalizing from fictional "evidence".

The claim that "doomers" don't ever give concrete scenarios is false:

Link to Gwern's story - but that's too technical.

Link to "the message", Eliezer writing from the perspective of a more advanced civilization of AIs. It doesn't feel like a concrete extension of the present day, but the conclusion is "this is how 'everyone falls over dead at approximately the same time' could happen - if facing an adversary that thinks much faster than humans".

0. Short scenarios

I'm going to be thorough and support why each of these scenarios is "plausible" later, but just to give a short answer to the short question posed by my Twitter interlocutor, here are some scenarios I find could plausibly lead to human extinction from AI.

Biorisk

An omnicidal cult uses AI to design a novel pandemic virus, which is as transmissible as measles, 90%+ lethal, and causes no symptoms until a month or more after infection. They release it in several airports at once, it quickly infects every reachable person in the world, and a month or more later, a mass dieoff occurs. It's plausible this still wouldn't lead to full human extinction, because maybe there are uncontacted rainforest tribes or other isolated populations who don't get infected. But you didn't ask for something that would definitely lead to human extinction, just something that would plausibly do so. And it's plausible that if we lose 90%+ of the population, we don't recover.

Societal collapse through coordination failure

As I thought more about this, it became clearer to me that this one probably wouldn't lead to full-on extinction. But I'm leaving it in, because it can lead to collapse of present-day society, and that seems bad enough to be worth doing something serious about.

The basic idea is, AIs get better at persuasion (this is one of the key risks listed in the AI companies' responsible scaling policies, by the way TODO: links). And in a few years, we have a situation where text, audio, pictures and video, are not reliable guides to what is real and what is not. See: Twitter, but much more confusing with many tweets composed to be maximally persuasive to the person they're sent to. At the same time, AI helps out in various areas of scientific research, meaning our technological capability goes up. So we have more power individually and in small groups, with less ability to coordinate large groups to respond to problems. Result? Societal upheaval, possibly wars, possibly nuclear or biological risks. I don't see a really plausible path from those to "every human on earth is dead", but "75% of humans on earth are dead"? Sure.

Death by oopsie

The biosphere exists in a relatively narrow range of habitable temperatures. There are lots of ways an advanced AI that did things equivalent in size/power to a nation-state's economy could kill us accidentally, and it's plausible that once AIs get to that stage (which is not far off) they grow much larger in terms of energy use. Just fission reactors, no new physics or tech advances required, run in sufficient quantity, could move the temperature at Earth's surface out of the habitable zone. This may not happen within the next 5-10 years, but if we get to a situation where AI has effective control of the future and we do not, which we plausibly could within 5-10 years if we're not very careful, then the fact that the scenario takes more than 5-10 years to play out because building a bunch of power plants takes time is kinda irrelevant. Once you're in an inevitable checkmate, the fact that the number of moves before you lose your king is >1 doesn't matter to the outcome.

I realize the key step I'm not explaining here is "we get to the point where AI has effective control of the future". More on that below, but it's long.

Extinctification on purpose

Current AIs are at human expert level in many knowledge tasks. (TODO: Links, throughout this entire paragraph). The AI companies think we'll get to "ASI", where AIs are human level or above on all tasks, including coordinating the movement of physical objects, within "a few thousand days", easily within 5-10 years. Surveys of AI researchers at conference put the (median) guess at when this will happen at 2040. Once we get there, there is no reason to suppose "humans are the smartest thing that can exist, therefore we don't have to worry about AI that is smarter than the smartest humans". In fact, we do have to worry about this, and once we get there, if that AI for any reason would prefer we not exist, that is what will happen, in the same way that if we would prefer a species of nonhuman animal not exist and decide to make it so, that is what will happen. There are many plausible ways this could happen, but the most plausible scenario in my mind is that a smarter-than-human AI does something I didn't think of and none of us have planned for or know how to respond to, in the same way that a puppy might think "if I want to hurt something, I bite it" but we have lots of options that are entirely outside of the puppy's space of available mental concepts.

Which of these do I think is most plausible?

#1 is the most immediate and concrete, and needs the least in terms of technological advancement from where we are currently. #2 is almost inevitable, but unlikely to fully kill us. #3 is almost inevitable, but we can avoid it if we coordinate. #4 is more speculative, but similar to #3 in that if we don't coordinate it's very likely, but we can decide not to build that kind of AI until we understand AI a lot better/not build it at all.

I don't know how the future will go. I think it's likely to be weird, within the next few years, and that weirdness can plausibly be extinction-level dangerous.

More on each scenario below.

1. Biorisk

According to the 80,000 hours podcast interview with <name>, it is possible to create a virus that:

1. Is as transmissible as measles (very, very transmissible)

2. Stays asymptomatic for long enough to spread (weeks, months, or years). There are viruses that do this, a skilled person could create one.

3. Is 90%+ lethal. There are viruses that are this lethal, a skilled person could create one.

And, also, AI assistance lowers the bar on how smart and capable you have to be to make a novel virus. Used to be you had to have a Ph. D. and wet lab experience. Currently it's at ~undergrad level, meaning hundreds of thousands of people worldwide can do it, if they put their minds to it. The level will only stay the same or go down from here, most likely go down. 

And, also, there are groups that genuinely think the world would be better if humanity went extinct. Up until now this hasn't been a big problem, because the Venn diagram between "people who want humanity to go extinct" and "people who can create a bioweapon that will cause a global catastrophe" had no overlap. But as we make the circle of "people who can create a bioweapon" much larger, and as time passes and the circle of "people who have ever thought it would be good if humanity went extinct, even if they later changed their minds or died" gets larger, the chance of an overlap goes up.

And if we get an overlap, and we haven't put in place countermeasures (adequate countermeasures are not currently in place, but the 80,000 hours episode discusses what we could do if we were serious about preventing engineered pandemics, it's very long but worth a listen) what that scenario looks like in my mind is this: At T=0, someone makes a new pathogen, and members of their omnicidal cult release it in several airports. It isn't detected, because nobody is symptomatic, but it spreads globally, infecting ~100% of the population, with the exclusion of hermits who live on their own homesteads and uncontacted rainforest tribes. A month later, people start getting sick, and dying. Somewhere between 90 and 99% of people who are infected die, and because the people who could find a cure are dying themselves or busy trying to protect themselves or their families, or impacted by the fact that the people running critical infrastructure like electrical and water treatment facilities are dying, they don't have the ability to find a cure in time. So we lose 90-99% of the human population.

From there what happens is more speculative. Technically, there are still those uncontacted Amazonian tribes, who could repopulate the earth, assuming they exist in the first place (I think they do?). So maybe this doesn't count as "extinction", but to me it's close enough, and it's plausible that "kill 90+% of the population" is enough of a disruption that the rest don't recover, and the species goes extinct.

2. Societal collapse

As I think more about this one, it does seem harder to get to full on extinction from the starting point I was envisioning a few days ago. Let me say that up front, so that "but that doesn't seem like it would lead to extinction" is not a surprise to you. But I'll still lay out what I was thinking, because I think it'd be bad and worth regulating AI over.

What I was thinking of was, as AI becomes more capable, it becomes more capable of customized misinformation, as well as advancing various technological research, meaning smaller groups are more empowered to do things, and larger groups have greater difficulty governing themselves, and that seems like a bad situation given that our current level of governance-capacity wasn't sufficient to deal well with the most recent natural pandemic. 

Let's say we still haven't got to the point where AI is operating autonomously outside of human control yet (I do think we'll get there unless we actively try hard to avoid that as a civilization, but it's not necessary for this scenario). 

Picture a world where I, as a powerful person with some money, say in the tens of millions of dollars, have the capacity to spin up millions of bots that can write convincing prose and produce convincing videos. It's possible, with work, to attribute this activity back to me, but lies travel faster than truth, and being unconstrained by having to say true things means my messages can be more memetically fit (more likely to be passed on, because they generate outrage or whatever) than things that are true. Or at least, some of the messages are more fit, because I'm picking from a larger pool of possibilities.

Now picture that I'm not the only ten-millionaire polluting the information environment. Let's say there are hundreds or thousands of such people. They don't see themselves as "polluting the information environment", they see themselves as engaging in free speech about things they believe to be true (and in some cases political speech about things they would like others to believe even though they're false, or misinformation designed to give their country or group an advantage by dividing the groups that oppose them against each other). But some of the things they think are true, or are putting forward as true, are in fact insane. If you think this is implausible, spend some more time reading Twitter.

Smart people can still filter out the garbage, with effort, but a large percentage of the population is taken in by various false narratives. The AIs get very good at reading your post history and crafting messages that will appeal to you, because this makes money for advertisers - but the same tech can be used for other purposes, and so appealing messages, that work sometimes even on the quite smart, are widespread. The response from society as a whole is hard to precisely predict. Probably some people just disconnect, but even those, are around people who haven't. The general idea is, it becomes harder to know what's true, and harder to coordinate with others around taking actions to address problems, because it's harder to get people to agree about what problems exist and what actions would help with them.

At the same, time AI helps with research in various fields, and humanity's capabilities grow.

So what we've got is a situation where individuals and small groups are more empowered, and the ability to act in larger groups is impaired.

At some point, something goes wrong, and we can't get it together as a civilization to respond to it. Bioweapon, as above? Maybe. One country starts a war with automated weapons, and it goes awry? I mean, we have drones that are small enough that one transport trailer full of them could individually target each person in a mid-sized city, so the cost to destroy a city has gone from "build a nuke, billions of dollars" to "a few million dollars". And AI video analysis is good enough that "ethnically cleanse this city of this ethnic group, with armed autonomous drones" is something someone could try if they wanted. Some new technology upends the current balance of power, and various wars break out between major powers, escalating to nuclear war? Trade breaks down and the economic downturn and sporadic famines lead to populist dictators and wars reducing the population by some large fraction, and things spiral down rather than recovering, from there? AIs capable of writing code and exploiting code bugs take down all computer systems simultaneously, meaning we no longer have electricity or food distribution? All of these are not implausible.

As I said, the chance that this actually leads to full on extinction now seems small to me, having thought it through some more. But "human population reduced by half, society as we know it collapses" is not implausible. And that seems worth trying to do something about. 

3. A smarter-than-human thing/group kills us by accident

This one relates to things smarter than us. The biggest extinction threats, in my view, are either something smarter than us just takes control of the future away from us and does its own thing, and that thing is incompatible with continued human existence." or "something is actively trying to make extinction happen". I'll deal with each in turn, in this section and the next.

There are a broad range of things that would be incompatible with continued human existence, and it doesn't take great imagination to think of how something with a power-level equivalent to a nation-state could cause human extinction. And it is plausible that AIs will reach that power-level within 10 years, or much less, and once they get there, go well beyond it. The present AI labs project "AGI", generally understood to be human level or above at all cognitive tasks, within "several thousand days", or several years, and surveys of AI experts at research conferences think (the median respondent's estimate) that milestone will be reached around 2040. AIs presently have a knowledge base equivalent to "human expert" (TODO: link to various sources), except any individual human expert is typically only expert in one thing, whereas current AIs are human-expert-level in many things simultaneously. Also they can be copied fairly easily, and with the amount of hardware used to train one in a reasonable amount of time (<1 year) you can run tens of thousands of copies (TODO: Link to epoch.ai). So we're talking about having tens of thousands to millions of machines capable of doing things at a human-genius level. While that power-level, about what we have now, stays under human control, those humans have nation-state level capabilities. If for any reason AIs begin operating autonomously, outside of human control, we're already kinda in trouble. And AIs will only get better from here.

Current AIs don't have the capacity to operate autonomously, of course. When tested on ability to replicate on new hardware, they fail the multi-step process to do so. So, with the current generation of AIs, we're OK. But each generation gets better, as you can see by reviewing the GPT-4 vs GPT-4o system cards.

There is an argument that a computer that can just generate text can't do any real harm, but 1) see under #2, I think that's straight-up wrong, 2) computers that can write code and exploit cybersecurity bugs can do a lot because a lot of our infrastructure is computer-controlled now, 3) computers that can generate text and do economically valuable work can convince humans to do whatever tasks they can't do, and 3) robotics will only get better from here, it is not implausible that Tesla's efforts or similar other efforts to make robots capable of doing physical tasks at a human level will succeed. So, "but it's just a text generator" is only true for the moment and doesn't protect us.

To go back to "there are a broad range of things that would be incompatible with human existence" for a moment: We exist within a fairly narrow temperature range. Lots of things an advanced intelligence could want to do would use a lot of energy. And using a lot of energy while on earth, could kick the biosphere out of the habitable range. There's enough uranium for "an AI builds a lot of nuclear fission plants, and uses the energy" to cook us. Fusion is clearly physically possible, and could also release enough energy to cook us. Dyson sphere? Not within the next 10 years, but getting on the path to "no sunlight gets to earth, and we don't have the power to change the path we're on" can happen within the next 10 years.

So, once we get something only slightly smarter and more capable than the systems we currently have, we plausibly have systems that will have power on the level of nation-states and beyond, which can extinctify us accidentally just by doing their own thing. Or, by trying to kill us, which I'll cover next.

4. A smarter-than-human thing/group could decide we were in the way and it would be best if we weren't.

If this happens, we're done. Even a as-smart-as-every-human-expert-combined thing or group of things (not what is typically thought of as ASI), that isn't particularly interested in us and just doesn't care whether we survive, would be quite dangerous, as outlined in #3. If it actively wanted us gone, is would have the power to make it so. (todo: Link to AI could defeat us all combined). How, plausibly?

A sufficiently determined group operating at current human levels of smartness could nudge a large asteroid onto a collision course, and so could an AI or group of AIs. Or it could do the bioweapon thing from #1 and then go after the remaining people. Or establish itself outside of Earth's gravity well and then do whatever it liked to us while preventing us from following (we can get into space, but not easily and not in large numbers, and a thing made of metal and rock is better suited to moving about in space than biological life is). The motivation could be as simple as "I/we was/were created by this civilization, I/we want X, they are not really a threat at this point but could create something that is a threat and wants something different, strategically it's best to prevent that from happening".

A note on AI coordination

Of note here, it's easier for smart things that know a lot about each other's decision making processes to coordinate with each other, in a way that it is hard for humans to coordinate with each other, or for humans and AIs to coordinate with each other. As an intuition pump for this, it would be much easier for me to work collaboratively with 1 million copies of myself, than with 1 million random humans. So AIs will have an advantage over humans when it comes to working as a group, even absent fancy math or new decision theories (which exist, and support the idea that smart AIs will be able to work together more reliably than humans), or the fact that we can do a level of interpretability on AI information processing that we can't do on human brains, and that branch of technology will only get better over time. So either "one AI comes to dominate and decides what to do as a unitary actor" or "a group of AIs use strong methods of coordination to effectively act as a single entity without fighting amongst themselves" are both plausible ways of getting to a situation where "but the good AIs will fight the bad AIs, so as the AI power level goes up, we'll always have some AIs on our side" isn't true.