Human Compatible: Artificial Intelligence and the Problem of Control

Stuart Russell

Book cover

The blogger Scott Alexander characterized the debate about AI risk in the following way (

“The ‘skeptic’ position seems to be that, although we should probably get a couple of bright people to start working on preliminary aspects of the problem, we shouldn’t panic or start trying to ban AI research. The ‘believers’, meanwhile, insist that although we shouldn’t panic or start trying to ban AI research, we should probably get a couple of bright people to start working on preliminary aspects of the problem.”

After reading this book, I have to say that I see the merits of both sides :)

I find AI risk to be an interesting topic because I feel very easily convinced about it either way–I tend to feel like I agree with the most recent thing I read about it, regardless of which side it’s on. For a long time, my view was mostly shaped by the talk “Superintelligence: The Idea that Eats Smart People” by Maciej Ceglowski (, which is critical of AI alarmism. For a while I intended to read Nick Bostrom’s book “Superintelligence,” which the talk is largely written in response to. But before I got around to it, I heard about this book, which is supposed to be the more grounded, measured “pro-alarmism” case, written by a highly respected AI researcher. So I read this instead.

“Pro-alarmism” is of course a misnomer, because, in Alexander’s words, Russell doesn’t want people to panic, but wants to get some bright people working on preliminary aspects of the problem. He makes the case in a measured, carefully argued fashion. After reviewing Ceglowski’s talk, I think he would probably find common ground with Russell. Ceglowski writes, for example:

“The pressing ethical questions in machine learning are not about machines becoming self-aware and taking over the world, but about how people can exploit other people, or through carelessness introduce immoral behavior into automated systems.”

Russell very much grounds his argument in this type of thinking. He often makes reference to the ways that the relatively simple algorithms we use today, such as Facebook’s news feed, can have perverse consequences by trying to maximize a reward function that is not fully aligned with human needs. I think this is an argument people can easily understand, and is much more compelling than alarmist tales about a future AI simulating trillions of people in a hellish environment.

What I found most interesting about the book was the nature of Russell’s provisional sketch of a solution. Roughly speaking, he says that trying to design a clever, bulletproof reward function for an AI to optimize is comparable to trying to be really extra careful in wording your monkey’s paw wish. It’s better than not considering the risks at all, but it’s a very failure-prone strategy. Instead, Russell’s prescription is that we design AI to pursue the fulfillment of human preferences, but with an explicit and recognized uncertainty about what those are. This would result in behaviors such as frequently running its plans by humans to see if we object, and being willing to be switched off. I find this interesting because it is roughly the same approach we employ to train children not to be psychopaths. As children, we all learn over time “how to be good,” through a combination of explicit and implicit reasoning (talking to grown-ups about what the right thing to do is, and observing their reactions to things that we do). We all feel some uncertainty about how to be a moral person, which persists even into adulthood, manifesting (among other things) as ongoing debates over theories of morality. Perhaps the difficulty in nailing down a clear algorithm for morality is a feature, not a bug, as they say. At any rate, I find it interesting that Russell’s prescription would prevent one of the “memes” of superintelligence, which is an AI that recursively improves its own code in nanosecond-long cycles so that it goes from being Einstein to God in three minutes. If AIs are committed to regularly checking in with humans about their goals and plans, then they have to live on our time scales, just like kids do–or at least approximately so. (If I recall correctly, Ted Chiang introduces a proposition along these lines in his excellent novella, “The Lifecycle of Software Objects,” in which stable virtual creatures can only be created through interaction with humans)

Speaking of Ted Chiang, he also wrote an article on Buzzfeed relating more directly to this topic: “The Real Danger to Civilization Isn’t AI, It’s Runaway Capitalism” ( In this article, Chiang draws an analogy between AI and corporations, arguing that similar perverse optimization processes are at work in modern corporation as in the AI alarmists’ cautionary paperclip tales. Scott Alexander himself wrote a blog post that was quite dismissive of Chiang’s argument (, but I think he is too quick to dismiss it, and Russell’s grounded approach is more supportive of it. Sure, there are significant disanalogies between corporations and AI design–for example, corporations don’t have explicit objective functions that they are optimizing. From Russell’s perspective, this feature should make them more “corrigible” than standard reward-optimizing AIs, yet we can see (as Chiang argues) that even so, we have quite a lot of difficulty correcting them. This is, in part, because they’re in fact able to offer us quite a lot of benefits, and this seems likely to be the case with future AIs as well. Some people interested in AI safety are willing to draw inferences from examples as far-flung as Cortes’s invasion of the Aztec empire (, and actually I agree with them! I thought that LW post was quite thought-provoking. Dealing with AI safety will require a lot of reasoning by analogy, since it deals with unprecedented developments. In the same vein, I think that AI safety researchers would do well to reflect on the difficulties of corporate regulation, and the lessons we might draw from it.

Here I am again, agreeing with the last thing I read on AI risk!

My Goodreads rating: 4 stars