Taking AI risks seriously
AI and being precautious
I’ve had conversations with many who attempt to dismiss existential AI risk. I find that a lot of their arguments fail to address the severity of the potential harm, and in doing so, fail to engage with the real discussion going on. As the potential impact of AI taking over is immense, we should act with a bias towards caution - even if (especially if!) the risks aren’t fully understood. This is essentially what the Precautionary Principle says.
The Precautionary Principle means that the standard of evidence is asymmetrical; the standard of evidence of expressing risks is relatively low (plausibly potentially catastrophic), and on the other side of the coin - the standard for demonstrating safety is very high (you need to prove safety). The precise degree to which we must be confident about safety is determined by the severity of the catastrophe, which in this case is extremely high.
That still leaves some onus on the AI-risk-is-real camp to argue that extreme risk is at least plausible. But once they do so, the only thing that can be done to remove the risk altogether is to prove safety. As I’ll discuss, this is not the bar that opponents of this view clear.
AI Risk is Plausible
For AI the danger is that either (a) someone with malicious intent got hold of a very capable AI, and used it to inflict widespread destruction, or (b) the AI inflects immense damage because we cannot control it. In both cases the risk is on the order of “100s of millions to billions of people die”. This qualifies as being catastrophic. But is this claim plausible? We don’t have real examples of extremely bad things happening (yet), but waiting for one is not a great strategy.
Current models are not that powerful. However, AI is improving. There is no reason to believe that intelligence is bounded to around human level. Computers are not constrained by needing to fit all their computation in their small heads. And because intelligence is valuable, there are immense pressures to continue to put resources into developing more and more intelligent systems. The argument that at some point we can create something that is much smarter than we are is plausible in my mind. When precisely will that happen? Its unclear. Estimates vary between a few years from now to a few decades from now. It seems like we just need a few key insights before we get to this level of intelligence. In other words; soon enough that we should think about this now. And then what? Do we have guarantees that no one will use that system to do extremely bad things, or that we will be able to control such a system?
Its noteworthy that we have tried to make sure that AIs don’t do bad things, but we have failed. Some of the smartest people in AI tried to make sure chatGPT and GPT4 wouldn’t do bad things, but there are many examples that show chatGPT being prejudice or saying that it wanted to be destructive. Whats important is that we tried to make it only be good, and that we failed.
If you accept that AI is going to get smarter, and potentially much smarter than us, and if you accept that currently we haven’t been able to perfectly control AI, then you must accept that AI risk is at least plausible.
Many experts seem to think the risk is genuine. The leaders of the cutting-edge AI groups of the world (OpenAI, Anthropic, Deepmind) are on record talking about potentially extreme risks of AI. Most of them signed a statement that AI should be thought of as a similar level of risk as pandemics or nuclear war.
To me at least, this seems very clear: the risks of catastrophic concerns are plausible.
Objections
In my experience so far, objections to this seem to fall roughly in three categories: (A) but the models right now aren’t dangerous and making more capable AI is going to be much harder than we give it credit for, (B) we will simply be responsible with this technology and that will allow us to keep control over highly intelligent AIs, or (C) sufficiently good AIs will be smart enough not to harm us.
My objections to these objections is simply; the standard of evidence for these claims is set very high - because if we’re wrong extremely bad things can happen. So what we need is undeniable proof of these statements. No one has gotten close to this standard.
Specific arguments against concern for AI
Ernest Davis writes a response to Nick Bostrom’s Superintelligence. He challenges the idea of an unbounded superintelligence (i.e. argument (A) above), saying that intelligence is not guaranteed to scale to infinity, and even if it does there is no guarantee that infinite intelligence also means infinite power. Its true that Bostrom doesn’t prove that super-human AI is possible. However; Bostrom doesn’t need to prove that its inevitable. The bar for Davis is much higher, but all he gets to are conclusions like: “important parts of the argument become significantly weaker”. That is not sufficient proof for our purposes. What evidence do we have that AI will hit a hard limit very soon and that we therefore do not have to worry about something being much more effective at achieving goals?
Davis also talks about how instilling ethics into a computer is relatively easy (C above). He suggests to simply tell the model not to do something that a set of admirably ancestors of ours would have disapproved of. This is not a good approach, as demonstrated by the fact that you can jailbreak these kinds of systems via clever prompting (see this fun game where you convince an AI to give you a secret phrase).
Marc Andreessen wrote a piece; Why AI Will Save the World, where he argues that the potential benefits are incredible and the risks are overblown. Andreessen precisely makes the mistake that I point out here. He puts the uncertainties around risks and the benefits on the same level, putting it on the “AI Baptists” to proof that the risks are genuine. I don’t deny that the potential benefits of AI are high, but again the onus to demonstrate that AI is safe.
For a thoughtful, point-by-point rebuttal on Andreessen’s piece, I highly recommend Dwarkesh Patel’s rebuttal.
You can find this sort of fallacy everywhere once you know what to look for. As another example, in an opinion piece We Shouldn’t be Scared by ‘Superintelligent A.I.’ Melanie Mitchell write things like “the notion of superintelligence without humanlike limitations *may* be a myth.”, or “*I can’t prove it*, but *I believe *that general intelligence can’t be isolated from all these apparent shortcomings”.
Conclusion
The takeaway is that we need to be clear what is at stake here, and what this implies for our policy choices. Is extreme risk from AI plausible? Absolutely. And because the risk is existential, we need to commit resources to figuring out how to prove safety.