39 Comments
⭠ Return to thread

A variety of reasons, most of which I suspect you will find boring/evasive/unconvincing if you're p|doom is high. (I mean that in a non-hostile way.)

-As a general heuristic, if something looks like "traditional apocalyptic doomsday cult thinking" it probably is, and that means it is probably not actually rooted in a rational examination of the evidence, and also, since I am in the believer group socially, I should down-weight the arguments if they seem convincing to me.

-As a second general heuristic, detailed, specific predictions about the long-term future have a bad track record. There are usually more not-that-unlikely ways these sorts of predictions can fail to come true, than actually come true. "AI will takeover" is not ultra-specific, so it doesn't *too* badly on this, but it is still a factor.

-It seems like a lot of things have to go wrong to arrive at doom:

-We need to achieve AGI at all. (Though if we are talking about risk over the next century, it seems unlikely we won't. But clearly not impossible.)

-We need to actually build powerful agents before alignment becomes trivially easy, and not rest content with somewhat less powerful agents and superintelligent reasoners that don't have goals (in the same was GPT-4 right now has no goals, although you can *sort of* turn it into an agent.)

-There need to be failures in alignment for very powerful agents, despite the fact that we are we all aware that powerful agents need to be aligned, and that current techniques for training AIs seem to result in them learning to mimic human reasoning, including human *moral* reasoning.

-It needs to be the case that unaligned AIs are actually able to takeover: I find stories about how this would actually occur often quite sketchy (and I have read a lot on this topic), and also that they tend not to take account of the amount of security people would probably put in place to prevent this happening, given that powerful agents are obviously a security risk. This is more likely I think if robotics is good, since there will probably be a lot of robots lying around to be hacked, than if it is bad, but on current progress I expect highly intelligent AIs to probably arrive *before* we figure out how to train really good robots.

-There needs to not be "good AI" that is aligned and able to *prevent* takeover by hostile AIs.

If the probability you put on all these are like 95%, for the first, 65% for the second (conditional on the first), 20% for the third (conditional on the first and second), 50% for the fourth(conditional on the 1-3), and 15% for the last (conditional on 1-4), that works out at a 1% chance of takeover. Admittedly, 1% is still quite a bit more than 0.01%, but I think this already illustrates that even quit modest optimism about alignment and the ability of bad agents to defend against good can already get you quite low. And then I want to adjust down further because of the two general heuristic I mentioned first.

I should also mention that when I've read stuff about why alignment will actually be really hard, it mostly wasn't that convincing. Originally, Bostrom (and maybe Yudkowksy?haven't read much Yudkowsky) seem to have been worried partly about "monkey's paw" be-careful-what-you-wish for scenarios where a particular goal gets programmed in and then we discover its not what we want: i.e. it makes us happy by directly stimulating our brains while we lie in Matrix style pods or something. But current ML techniques don't involve programming in an explicit final goal of the model. Since then, I have read people like Cotra and Hubinger on why an unaligned AI is the default outcome of current training techniques, but I found their reasoning murky and hard to follow, and also, they were assuming relatively minimal safety effort, to a greater degree than I think is actually likely to happen.

Expand full comment

> As a general heuristic, if something looks like "traditional apocalyptic doomsday cult thinking" it probably is,

If you want to know whether nuclear weapons would ignite earths atmosphere, you can't have a long argument about various biases people might have. You need to actually look at the evidence. You can list some hypothetical biases people might have. I can list hypothetical biases in the other direction.

> As a second general heuristic, detailed, specific predictions about the long-term future have a bad track record.

Not THAT bad, but not great. I mean if you take serious experts trying their best to predict the future, I would say somewhere between 10% and 90% of the attempts were close to correct.

Is "doom" a more specific prediction than "not doom"? Most arangements of atoms are not humans. So if we have maximal uncertainty over arangements of atoms, that would put the chance of doom as very high.

"It seems like a lot of things have to go wrong to arrive at doom:"

In any specific doom scenario, lots of things have to happen, in a way that, given the result, can be considered going wrong.

But there are many different doom scenarios. And also many AI companies making many AI's. So while any particular AI development effort hitting a particular type of doom might be unlikely, some AI hitting some doom is more likely.

> We need to achieve AGI at all. (Though if we are talking about risk over the next century, it seems unlikely we won't. But clearly not impossible.)

Granted.

> We need to actually build powerful agents before alignment becomes trivially easy, and not rest content with somewhat less powerful agents and superintelligent reasoners that don't have goals (in the same was GPT-4 right now has no goals, although you can *sort of* turn it into an agent.)

Alignment won't magically turn trivially easy. At best we would be looking at a world where humanity put in the hard work to solve alignment and not build powerful agents until we had solved it.

I really don't think alignment is going to be trivially easy. At best, it may be basically doable. No theoretical breakthrough in understanding makes a design typo proof, nor idiot proof, nor political pressure proof.

And until that point, we need to coordinate all of the many AI companies and governments around the world to not produce superintelligent agents. And this relies on knowing where the dividing lines are. How much compute can you throw into reinforcement learning before it becomes dangerously intelligent? We just don't know.

This also assumes that "superintelligent reasoners that don't have goals" are both possible and safe. Such systems are at most one stupid question away from agentic superintelligence. At least if they answer questions. Just ask them what an agentic superintelligence would do? Or what the code for an agentic superintelligence is. And you could destroy the world with self fulfilling prophesy if it predicts the future.

> There need to be failures in alignment for very powerful agents, despite the fact that we are we all aware that powerful agents need to be aligned, and that current techniques for training AIs seem to result in them learning to mimic human reasoning, including human *moral* reasoning.

Current techniques are known to be inadequate when it comes to superintelligence. Current techniques are based on imitating humans. Both the good bits and the bad bits. But this is less like the AI being moral, more like an alien actor pretending to be a human. Current techniques are based on a huge quantity of trial and error fine tuning. A "tweak it till it works" approach. This fails when the AI is smart enough to break out of it's training enviroment. Or smart enough to realize it's being tested and act differently.

> It needs to be the case that unaligned AIs are actually able to takeover: I find stories about how this would actually occur often quite sketchy

I can strongly predict that deep blue will beat me at chess, despite being unable to predict exactly what moves it will make.

> and also that they tend not to take account of the amount of security people would probably put in place to prevent this happening, given that powerful agents are obviously a security risk.

Humans best security is still sometimes broken by humans. And the security around current AI varies between abysmal and non-existant.

> This is more likely I think if robotics is good, since there will probably be a lot of robots lying around to be hacked, than if it is bad, but on current progress I expect highly intelligent AIs to probably arrive *before* we figure out how to train really good robots.

There are a lot of humans around waiting to be hacked. There are plenty of companies that will manufacture components to your specifications without asking questions. Plenty of people that can be tricked or brainwashed or paid into following the AI's plans. The AI doesn't need robotics. There are loads of gullible humans that can be tricked into doing whatever the AI wants.

> There needs to not be "good AI" that is aligned and able to *prevent* takeover by hostile AIs.

Can a "good AI" prevent takeover? Some of the techniques people are suggesting to make AI good would also seriously weaken it. Like designs that imitate humans, or that aren't agentic, or that are myopic. Many alignment plans come with large performance penalties. And even if it didn't, I suspect a battle between superintelligences may leave the world unsuitable for human habitation. (Irradiated hellscape?)

> But current ML techniques don't involve programming in an explicit final goal of the model.

Sure. If it's agenty it still has some goal implicitly represented inside it, in a format that we can't directly read or write. And that goal can be monkey-pawed.

The difficulty is, in some sense, that we need to specify what we actually want, and this is really hard.

There is a fair bit of disjunctive arguments for AI doom, with monkey paws only being one of them.

Expand full comment