8 Comments
User's avatar
Duane McMullen's avatar

I read Y&S to be saying:

1. A superintelligence will destroy humanity.

2. Current AIs are probably/probably not superintelligent.

3. We do not know when we will cross the threshold into superintelligence as we throw increasing resources into developing more capable AIs.

4. We are, at best, playing at the edge. It is highly plausible that we will cross over into superintelligence BEFORE we figure out what the edge is.

In that sense, your review does not contradict the argument.

Expand full comment
Patodesu's avatar

The first points Y&S are really saying are:

A Superintelligence with current AI Alignment techniques will destroy humanity

We are super mega far from having good enough techniques

So he is conterarguing the arguments. The second one in particular.

Expand full comment
Simon Lermen's avatar

I responded to one of your claims, ie that we may be able to use AI to solve the alignment problem: https://simonlermen.substack.com/p/why-i-dont-believe-superalignment

Expand full comment
Rafael Ruiz's avatar

(Disclaimer: I'm only halfway through the book)

I think a key crux is whether takeoff will be fast or slow (continuous or discontinuous?). You say that we can use AI+Humans to align ASI-, then use ASI-+Humans to align ASI, and so on. But humans might not be able to be adding much to the conversation once we reach ASI. The analogy being that it's like putting a combination of Magnus Carlsen and Stockfish17 to make Stockfish18. Magnus Carlsen might have nothing to add to the conversation, and might even be dead weight at that point.

You might say that, well, at least humanity can put the humanistic values into the ASI. Like, Magnus can put its values and bend the way the future Stockfish18 will play. But that presupposes that technical alignment will be solved at least to a sufficient degree, and that no surprises arise from the behavior of Stockfish18 or its descendants once it's built. Once it's built, it might be running the show if it can do recursive self-improvement and process things must faster than humans.

Also, we have to assume that all ASIs will be aligned, or that the aligned ASI will destroy the others if they were to arise. If we have several ASIs by several different actors, some aligned, some misaligned, we would be living in a very fragile and unstable world.

For what it's worth, I think this is the most promising path to alignment. But it is a tricky path.

Expand full comment
comex's avatar

But will we *need* to add to the conversation at that point?

Here’s a somewhat more optimistic chess analogy.

As you get better at chess, I think your move evaluation function mostly gets refined rather than replaced. One of the first things you learn in chess is not to give away your pieces for free. But if a beginner thinks a move is bad because it gives away pieces for free, chances are Magnus Carlsen will also think it’s bad, and so will Stockfish. Higher levels of play are mostly about finding the differences between moves that seem equally good at lower levels of play.

Sometimes that isn’t true. Sometimes Stockfish will make a move that seems outright bad to a lower-level player, seemingly giving away a piece for free – only to start some insane sequence that gives it an advantage 8 moves down the line. But this is rare. And when it does happen, the confusion is only temporary: the value of its move becomes clear once the sequence is finished.

The ASI equivalent would be something like: As ASI gets more advanced, it’ll start to come up with subtler and subtle moral preferences that we can’t understand or control. But those preferences will only come into play to distinguish between choices that seem morally equal (or near-equal) to us. In rare cases, the ASI will take actions that seem morally wrong to us at the time, but only because understanding their moral value requires predicting the future better than humans can. Those actions will be understandable in hindsight.

Again, this is an optimistic analogy. But I do think it’s a possible outcome, perhaps even a likely outcome.

Expand full comment
Rafael Ruiz's avatar

Stockfish, if you let it think for a long time, can find checkmate in 16 and stuff like that. I don't think the heuristics human use are *that* similar to the way that Stockfish thinks. I don't think it usually thinks "Don't give pieces away", but "Nf3 has failed 37% of the time, and Be5 88% of the time, (and thousands of other combinations, anticipating several movements ahead), and the best one is Nf3, so I'll play Nf3"

I think your ASI equivalence already presupposes that we can solve the alignment problem. There's no reason an AI will be moral by default, and plenty of reason to think that such an alien non-biological intelligence might have very strange "preferences".

And, even if it's aligned, I mean, maybe it kills all humans because it sees what we're doing in terms of factory farming or other stuff that it considers morally atrocious.

Expand full comment
Howard Hansen's avatar

If someone builds it and we all die, will it matter?

Expand full comment
Devon's avatar

Yes.

Expand full comment