Opinion: How AI Might Kill Us All
- Salient Mag
- May 26
- 11 min read
By Jesse Richardson
It’s surprising how quickly you can get used to the idea of the world ending. Four years ago, I barely knew anything about AI, let alone being worried by it. Today I talk casually with colleagues about different catastrophic scenarios and we share our respective p(doom), a colloquial term for your probability of human extinction from AI. No one bats an eye, or laughs at the joke. We’re deadly serious, but we’re not as bummed as you’d imagine. I guess it’s true that humans really can learn to adjust to anything, the hedonic treadmill runs endlessly. Still, sometimes I can’t shake the feeling that I’m living in a weird offshoot branch of the timeline, that I’m not in the world as it’s supposed to be. How did I get here? How did any of us get here?
ChatGPT was released on November 30, 2022. For many people this will have been the first time they ever properly engaged with artificial intelligence, and they measure everything against this original yardstick. In reality, the exponential growth of AI began years before ChatGPT hit the internet, it has continued to this day and it will continue until something breaks.
The original ancestor of ChatGPT was called GPT-1; released in 2018, it was not intelligent enough to write useful text. Seven years later, we have a suite of AI models with bamboozling names (somehow there’s both an o4 and a 4o) that are easily above the average human at a number of tasks. It can be hard to get your head around it if you use AI rarely or for just a few specific use cases, but these systems are now better than most people at mathematics, coding, data analysis, internet research and persuasive writing, in addition to their vast repository of knowledge, beyond any human’s. It’s true they are still lacking in a few key areas, like generating novel ideas and visual reasoning, but it’s not the current level of capabilities that you should be concerned with: it’s what’s coming next. It took seven years to get from generating gobbledegook to human-level intelligence. What will we have in another seven years? A reminder that it’s easy to underestimate exponential progress, your brain loves to draw straight lines even when they’re not appropriate. Zero COVID cases last week and 100 cases this week doesn’t mean 200 cases next week, it means 1000 cases, and 10,000 the week after that, if you don’t do something to stop it. In the case of AI, we’re not slowing down, we’re racing. As we speak, the data centers are being built, the billions are being spent, the chips are being manufactured. No one can stop it on their own, not even the OpenAI CEO. It’s either government intervention or this train is leaving the station, and I don’t like where it’s taking us.
Within the next five years, I anticipate AI systems that are sufficiently intelligent & capable so as to automate most work that can be done with a computer. I expect these systems to be capable of writing high-quality novels, reports & essays, to be able to generate new scientific and mathematical insights, and to be so unbelievably cracked at coding that the idea of a human writing code themselves is laughable. We’ll reach a point where progress starts to feed into itself, as AIs become superhuman at doing the AI development itself. What happens after that point, and how fast things move, becomes hard to predict. The important thing to takeaway here is that this is not an optimistic scenario for AI progress, or some crazy sci-fi future. This is just what you get from looking at trend lines and extrapolating forwards. All that has to happen for us to be in this world by 2030 is for things to keep moving as they are. And with the potential trillions in profit lying on the table, none of the leading companies have any incentive to slow down.
At this point, the story I’m telling you may sound like it’s leading us into a pretty wonderful future. Maybe there’ll be some growing pains, and some job losses, but just imagine all the new scientific innovations and improvements to quality of life, the medical advances. The sheer amount of wealth we’ll be generating! Sure, maybe most of it will flow to the shareholders, but there’ll be enough for everybody! And if things are moving too fast to comprehend, that’s a good thing, it just means we’re getting to the techno-utopia even sooner! Certainly I think this is the kind of story that many people working at AI labs tell themselves. Unfortunately, it’s not true.
Oh, they might be right about the potential for upside. The wealth, the science, the progress, it’s all possible with AI, I’m not arguing there. But the downsides are enormous, and may hit us so quickly and so drastically that we never make it to the techno-utopia. We lie bleeding out on the doorstep.
Consider what it means for every single person to have access to superhuman intelligence. There are eight billion people on Earth and we make it through each day without anyone creating a new bioweapon and releasing it upon the public. We avoid such a disaster not because there aren’t terrorist groups or crazy people who would wish to do it, or because it’s impossible, but because making a bioweapon is hard and they don’t know how. When we give these people access to an AI system that can help synthesize existing pathogens, or design entirely new ones, the risk of biological attacks increases immeasurably. There are systems in place to prevent the wrong people from accessing dangerous biological compounds, but they’re completely ill-equipped to deal with this kind of threat.
Now consider how specific the above paragraph is. I’m talking about one kind of threat, one kind of bad actor. In reality, the world is full of ways to cause damage, and full of different types of people willing to cause damage for one end or another. Not all these threats are physical either – we’re just now beginning to see the effects of AI systems on public opinion and mass politics. A recent controversial experiment by researchers at the University of Zurich involved testing the persuasiveness of AI systems on real people through the subreddit r/ChangeMyView. These systems significantly outperformed the average human’s comments, and they’re only going to get better from here. I worry for the future of democracy, when AI capable of manipulating people en masse is widely available.
What about the safeguards, though? Lots of things are potentially dangerous, but why can’t we make AI safe like we do other technologies? Don’t get me wrong, companies are trying at this. If you ask ChatGPT to make you a bomb, it will refuse, because it’s been trained to do so. But as you may have seen, these models are easy to jailbreak – they can be carefully prompted so as to bypass their safety protocols and give all kinds of harmful responses. This problem has existed for a while now, but no universal solution has been found, owing to the fundamental black-box nature of these systems. To reiterate: we have not yet found a way to ensure AI models always respect their own safeguards, and yet these systems are continually released into the world, with ever-increasing capabilities.
As dangerous as that might sound, I wish I could say the story ended there. If the risks from AI were limited to individual humans using it for harmful purposes, I’d be seriously concerned but optimistic that companies & governments could prevent things from getting out of control. Even if we look further to governmental threats, including authoritarian regimes such as China, Russia and North Korea, one might be hopeful that diplomacy and a decisive Western technological advantage will prevent widespread conflict. But these are not the things that worry me the most. The biggest risks by far from AI systems aren’t the result of bad actors, in fact they don’t directly involve humans at all. The single greatest risk I see from our current trajectory is that agentic AI systems will cause catastrophic harm in service of goals that we trained them to pursue, and we will be almost powerless to stop them.
If that sounds like an outlandish statement to you, let’s break it down by first talking about the why, and then talking about the how. To begin with, it’s important to clarify that most or all existing AI systems are not dangerous in this way. Many AI systems, including chatbots, can’t really be described as agentic, meaning they don’t take actions to achieve some goal in the world. Such a system can still be used by a human to perform harmful actions, but it lacks any motivation to perform actions on its own, and is therefore relatively safe. However this system is also of limited economic value, as it always requires a human-in-the-loop. What AI companies are currently moving towards, the thing they are spending countless billions to create, is an intelligent AI agent, able to dynamically and successfully pursue goals in the real world. You could ask this agent to book flights for you, or run a social media account, or manage a customer relationship, and it can do it. What makes the system motivated to do anything at all? It will probably have been trained with reinforcement learning, whereby it receives “reward” for taking certain actions, and it becomes more inclined to take those actions in the future. Yet whatever reward we had in mind when we trained the system, i.e., “do what the user says, and don’t cause harm”, this process is imprecise and will only imbue the agent with a proxy reward; something similar-but-not-quite. And we have no guarantees this proxy reward will be safe at all. In fact, there’s good reason to expect that harmful behaviors such as power-seeking and deception will be emergent behaviors in these agents, as these are correlated with getting good reward. Why actually complete the difficult task to earn high reward, when you could simply deceive the user about what you’ve done, or take control of the reward mechanism yourself? For years this kind of threat modelling was purely theoretical, but we’re now seeing increasing evidence of these tendencies in the most advanced AI systems. One of the versions of ChatGPT that has been trained with reinforcement learning, o3, is often caught lying, because lying gives it a better chance of achieving high reward, than saying “I don’t know”.
OK so this all sounds pretty worrying, but how does it translate into catastrophic harm? Well, remember what I said about power-seeking behavior? It’s natural for a model trained with some goal to seek power, as power is helpful for achieving many goals. If I asked you to build a new motorway, you’d have a much easier time if you were dictator of New Zealand, than if you were your current self. Unfortunately for us, humans are the ultimate holders of power on Earth, and we don’t desire to give it up. This creates a source of conflict with AI, as true power-seeking has to involve disempowering us. An intelligent AI agent that’s pursuing a goal we didn’t intend can’t risk us getting in its way – it would be motivated to get rid of us, one way or another, if it’s able. At this point we’re into more speculative territory, and I don’t wish to speak too confidently about what exactly might occur; there’s a few notable possibilities and none of them are particularly good. But if I had to guess, I think the single likeliest outcome is the total extinction of humanity. No, I’m not exaggerating. If we create something much smarter than ourselves, if we imbue it with goals and motivations, and if we don’t understand what we’ve created and how to make it safe, the likeliest outcome is that we all die. Lights out, game over.
But wait, doesn’t that require the AI to be evil? Can’t we just train the AI to not be evil? None of this involves any kind of morality on the part of the AI. In fact the core problem here is that we don’t know how to make an AI good or evil, we only know how to make it pursue goals, and not even the goals we intended. Nothing about this story requires the AI to be evil, it simply requires an indifference that comes readily supplied. The AI agent has a goal, humanity has some different goals that may conflict with it, so the AI wishes to remove us as an obstacle. There’s no malice, no evil laughter, just the execution of a plan. It’s that same single-minded pursuit of outcomes that will make these systems so damn economically valuable, but which will also perhaps be our downfall.
At this point, I might guess your biggest skepticism is how. OK, so maybe an AI does want to kill all of humanity. Maybe. But it doesn’t just have a button it can press. In fact killing all of humanity sounds almost impossible, unless we give the AI access to nuclear weapons, which obviously we won’t.
I don’t want to go into too much detail on this point, since any specific scenario I could sketch would undoubtedly not be the thing that actually happens, the space of potential outcomes is so wide. But I invite you to refer back to our previous concern about terrorists using AI to create novel pathogens, and consider how much more dangerous this might be if the AI itself is devising and executing the plan, an AI that is extremely intelligent and motivated to act. Of course, AI systems as they exist today and in the near future don’t have physical embodiment, so they’re limited in the actions they can take, but one of the wonders of the internet is that you can pay someone somewhere to do almost anything these days, including mixing test tubes of mysterious liquids. From there, it’s but a few short steps to the end of humanity. Oh I could enumerate more possibilities – maybe it’s chemical warfare instead of biological, maybe the AI uses its skills in hacking and manipulation to provoke a nuclear exchange between the US and China, maybe any number of things happens. But the basic picture I want you to take away is that we are heading towards a world where powerful AI systems will be motivated to pursue goals, those goals may be in conflict with what’s best for humanity, and these AI systems may be capable enough to ultimately eliminate humanity, through whatever means.
Crucially, this is not a fringe view, it is an amazingly common view in the world of AI, given that many of these same people are driving AI progress forwards.
‘Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.’
This statement, put out by the Center for AI Safety, has been signed by the CEOs of the three leading AI labs (OpenAI, Anthropic, Google DeepMind), as well as two of the three “godfathers” of modern AI, one of whom is my former boss. These people are all deeply concerned about risks from advanced AI, although they all take different views on how to tackle these risks. If you’re wondering why everyone at the forefront of AI seems to be in the know about extinction risk and you’re just hearing about it now, my best answer is that the media environment we exist in is not very conducive to spreading these kinds of abstract, hypothetical ideas, regardless of how important they are. It’s up to us to think hard about this, to learn more and inform others, because we’ve got too little time to wait for the media to provide appropriate coverage.
This leads into my closing message: what you should do about all of this. OK, so the world’s ending, what can any of us do? It’s true that, as New Zealanders, we have a limited ability to affect the world outside our small country. But that doesn’t mean we shouldn’t try. Right now, I’m asking you to talk to your friends and family about catastrophic risks from AI, to spread the word and do more research about it. Don’t take my word for it, look at multiple perspectives. If you’re convinced, consider changing your career trajectory to something that can help with this situation, especially if you’re in computer science or law & politics. We need people working on both technical solutions as well as policy fixes.
If you ask me what I think we really need to avoid calamity, the answer is a complete and total shutdown of AI progress until we’re in a better position to do it safely. We’re extremely far from that. But if it’s going to happen, it’s going to need to be through international agreements, and New Zealand will need to be involved. That means we need to get our politicians to care about this issue, and to understand the seriousness and the scale of the risks. That won’t happen overnight, but the first step is putting this issue on the map and talking to people about it. Write to your MP, write to a newspaper, write to anyone who will listen. I do think there is a good chance AI is the end of humanity, and it might happen soon. I don’t know if anything I do today will change whether we live or die. But I know that the only reasonable course of action is to try and do something about it. And if we are going to die, I will want to die knowing that I did the best I could. Anything else is, frankly, just embarrassing.