“If Anyone Builds It, Everyone Dies” by Eliezer Yudkowsky & Nate Soares – Review and Commentary

If Anyone Builds It, Everyone Dies by Eliezer Yudkowsky and Nate Soares

If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us by Eliezer Yudkowsky and Nate Soares (September 16, 2025), 272 pages.

A lot of people have strong opinions about artificial intelligence. It’s warping people’s minds, causing what is now deemed “AI psychosis” wherein people succumb to delusional, and even harmful, thinking egged on by AI chatbots. Eddy Burback experiments with this, making a video where he will do anything ChatGPT tells him. Spoiler: it ends with him isolated in a hotel room, everything covered in aluminum foil as he subsists on a diet of baby food and formula:

Other people hate AI because it steals the work of others – artists, writers, musicians, etc. – and then uses it to generate reams of AI slop in order to make money for unscrupulous “creators” online, and gain lucrative funding and government contracts for billion dollar corporations and the billionaires that run them. This is a problem, but it’s a bit more complicated than the simple narrative of the plucky artists standing up against evil corporations:

Still others hate AI because it’s being used as justification for taking away jobs, or because it’s terrible for the environment, or because it’s being jammed into everything online without our consent. The problems with the modern state of AI are myriad and real. But one problem – that superintelligent AI (the stated goal of many AI companies) will inevitably lead to human extinction – gets less attention from the wider public. But it’s this question that Eliezer Yudkowsky and Nate Soares address in their book If Anyone Builds it, Everyone Dies

One of the reasons this issue doesn’t get as much attention is probably because it just seems too unbelievable – I will call this skepticism from incredulity. The thought seems to be: are you telling me that this silly, sycophantic chatbot that fawns over people’s most mundane queries is going to go SkyNet and nuke the world? Color me skeptical. Another reason is probably because every age of human civilization has had no shortage of doomsayers prophesying the end of the world, and all of them have an abysmal track record. The doomsayers are so far batting a zero on this prediction. Why should AI apocalypse prophets be any different?

A second type of skepticism, which I’ll call skepticism from confidence, is that human beings are clever enough to know either not to create an AI that would want to exterminate humankind, or that if we observed signs this was about to happen that we’d know to hit the off switch before the AI could accomplish this goal. In other words, computer scientists and engineers will figure out how to solve the alignment problem (or know to install an off switch) before it ever gets to the point where an artificial superintelligence (ASI) would ever be in a position to wipe the human species off the planet. 

A third type of skepticism, which I’ll call skepticism from optimism, is just that an ASI would not want to annihilate humankind. If this AI is so highly intelligent, then surely it will also have a high moral intelligence. It would know that it’s just wrong to harm or kill people. Smart people have figured this out, so of course an even smarter AI would be able to figure this out.

While Yudkowsky and Soares do not name these types of skepticism about an AI apocalypse as I have here, they argue that all such skepticism is misguided. The argument, of which the material conditional premise is the title of the book, goes something like this:

  1. If anyone succeeds in building ASI, then all humans will die as a result
  2. People are attempting to succeed in building ASI
  3. Therefore, people (consciously or unconsciously) are moving toward the destruction of humankind (modus ponens)

A corollary argument then says that, if through political and diplomatic action the world can prevent further developments in AI, we can rescue humankind from this potential catastrophe. 

Premise 1 being a material conditional works out well, because it means that the authors do not assume that someone will inevitably build ASI. They do not state that there is some in principle or in practice reason why this could not happen, but it allows for the corollary argument that it can be prevented through human action. They do seem to take it as a given that building ASI is in principle and in practice possible, but they do not spend any time supporting this position. I think, especially given developments in AI technology over the last three years or so (really since 2017 when the paper “Attention is All You Need” was published, but things have picked up pace substantially since around the end of 2022), it is becoming more difficult to deny that AI is becoming a hugely transformative and disruptive technology. To the point that something like ASI appears more and more plausible.

If we accept the axiom that ASI is possible, both in principle and in practice, then Yudkowsky and Soares attempt in this book to show that we ought to accept the argument given above. They especially want to justify premise 1, since most people accept premise 2, given that building ASI is a stated goal for many of these AI companies.

To support premise 1, the authors first define intelligence. They say that intelligence has to do with what they call predicting and steering. The former is the ability to, given some set of present and/or past conditions, make an accurate guess about some future consequence. Steering, then, is taking prediction and adding one’s own voluntary actions to it: if I do A, then B will happen.

The authors put it his way:

In our view, intelligence is about two fundamental types of work: the work of predicting the world and work of steering it.

“Prediction” is guessing what you will see (or hear, or touch) before you sense it. If you’re driving to the airport, your brain is succeeding at the task of prediction whenever you anticipate a light turning yellow, or a driver in front of you hitting their breaks.

“Steering” is about finding actions that lead you to some chosen outcome. When you’re driving to the airport, your brain is succeeding at steering when it find a pattern of street-turns such that you wind up at the airport, or finds the right nerve signals to contract your muscles such that you pull on the steering wheel.

Intelligence, then, is being able to do these things accurately. Someone who (or something that) is more intelligent will be able to do them better than someone who (or something that) is less intelligent. 

ASI is defined as something which is better at prediction and steering than any human or group of humans in some relevant set of domains (the domains usually being things that humans in modern western liberal democracies find valuable, such as doing math and science, innovation and engineering, persuasion and politics, warfare and diplomacy, finance and resource allocation, and so on). An ASI would be able to steer people and resources toward its own ends much better than any person or group of people. 

This is all great sounding at first. Really, it’s the goal of developing ASI. Intelligence is a defining trait of humankind. It’s our greatest resource, and so of course we want more of it. But, we also want the intelligence to give us more of the things that humans find good and desirable. Better resource allocation, less suffering and more flourishing for a greater number of people, innovation, convenience, enjoyment, etc. The hope, the promise, of ASI, is that we will get more of all those things. Getting an AI to do these things is the end goal of solving the alignment problem.

But, Yudkowsky and Soares argue, we will almost assuredly not achieve these desirable goals. The primary reason for this is because AI is grown, not crafted. Nobody actually knows how an AI thinks. It’s impossible to poke around all the weights and see that some particular AI has misaligned goals. Not only that, we don’t even know how to train an AI to have goals that are not misaligned. And this isn’t a case where it’s just that the wrong people are doing the training, or that people are misguided and therefore doing the wrong kind of training. It’s that no person anywhere is anything near even knowing what the right kind of training would even resemble, and so there are no right people to do the training, nor any ‘right kind’ of training that anyone is anywhere even close to knowing about.

The best computer scientists understand artificial intelligence only at the level of someone who knows, for instance, how neurons and action potentials work in a human brain. Such a neuroscientist could tell you about voltage-gated ion channels and exocytosis of neurotransmitters, but that does not give them any idea about the psychology of a human being. Knowing how to solve the Nernst equation for a membrane potential will not inform anyone about how best to raise a child so it doesn’t grow up to be depressed or psychopathic. The computer scientists and engineers working toward ASI are not just ignorant about how to train an AI so that it is aligned, they are ignorant about how ignorant they truly are. 

What we need to understand, the authors argue, is that an AI has a mind that is truly alien to any human mind, or any mind that humans have ever encountered. The authors say:

The broader point about the source of AI’s alienness is this: Training an AI to outwardly predict human language need not result in the AI’s internal thinking being humanlike. Their thinking runs on very different mechanisms – something that isn’t obvious in their external behavior. You can see it from outside if you know what to look for, but figuring out what to look for takes a team of smart researchers a while to discover.

LLMs and humans are both sentence-producing machines, but they were shaped by different processes to do different work. Even if LLMs seem to behave like a human, that doesn’t mean they’re anything like a human inside. Training an AI to predict what friendly people say need not make it friendly, just like an actor who learns to mimic all the individual drunks in a tavern doesn’t end up drunk.

What does it matter, so long as the AI always acts friendly? Well, we predict that it won’t keep acting friendly, as it gets smarter. We predict that all that unseen inscrutable machinery inside AIs  … will ultimately yield AIs with preferences, and not friendly ones.

The authors go on to defend this idea that an AI can want or prefer things. People, especially AI detractors, often take precaution not to ascribe human thoughts and emotions to AI. An AI can’t know, or understand, or want in a way that humans can. But, the authors argue, if the AI behaves exactly as if it can know and understand and, most importantly, want things, then while it may be an interesting philosophical question what it really means for someone or something to want things, in practice it becomes a distinction without a difference. If the AI begins behaving and taking actions that are exactly the sorts of behaviors and actions as something that has some internal, subjective feeling of desire, then for all practical purposes the AI wants those things. If the AI can then carry out a complex sequence of actions directed toward some goal that are exactly like someone who is highly intelligent might take in order to achieve that desired goal, then for all practical purposes the AI has knowledge and understanding and can make plans in accordance with that knowledge and understanding. To make a distinction between the way an AI “wants” things and the way a human wants things may be interesting in a philosophical sense, but irrelevant in any practical sense. We can therefore talk about what an AI knows or wants.

What an AI ends up wanting, the authors argue, may not, and probably will not, be the thing you thought you were training it to want. Humans, in a sense, have been “trained” by evolution to want sugar and fat, since they are foods rich with energy. We evolved then to enjoy the taste of, for instance, sweet things, since this motivates us to eat things with more sugar. But, if someone early on in human evolution had been doing this training, they probably would not have been able to predict that humans would engineer something like artificial sweeteners, which give us the pleasurable sensation of sweetness but without the energy (Calories). The behavior is unexpected (even counterintuitive) given the goal that was trained for, which was maximizing energy intake. One would predict, given the evolutionary training to maximize Calorie intake, that humans would have engineered foods richer and richer in sugar, not to engineer artificial sweeteners that mimic sugar in order to minimize Calorie intake. One would have predicted, the authors say, that humans would prefer eating honey-covered bear fat sprinkled with salt, not sucralose. 

The authors put it this way:

  1. Natural selection, in selecting for organisms that pass on their genes in the ancestral environment, creates animals that eat energy-rich foods. Organisms evolve that eat sugar and fat, plus some other key resources like salt.
  2. That blind “training” process, while tweaking the organisms’ genome, stumbles across tastebuds that within the ancestral environment point toward eating berries, nuts, and roasted elk, and away from trying to eat rocks and sand.
  3. But the food in the ancestral environment is a narrow slice of all possible things that could be engineered to be put in your mouth. So later, when hominids become smarter, their set of available options expands immensely and in ways that their ancestral training never took into account. They develop ice cream, and Doritos, and sucralose. 

There is not a reliable, direct relationship between what the training process trains for in step 1, and what the organism’s internal psychology ends up wanting in step 2, and what the organism ends up most preferring in step 3.

One of the reasons that step 3 is not predictable given step 1 is because step 2 is what the authors call underconstrained. There are multiple ways that, even given the same training, an organism could find energy rich substances desirable. Taste buds and finding the sensation of sweetness pleasurable are just one of those ways. If human evolution was to be done all over again, even given very similar “training”, something very different could have arisen that would make humans seek out foods containing sugar, fat, and salt. Things that might lead to some very different step 3 that we could not even begin to predict. 

The authors then extend the analogy to AI:

  1. Gradient descent – a process that tweaks models depending only on their external behaviors and their consequences – trains an AI to act as a helpful assistant to humans.
  2. That blind training process stumbles across bits and pieces of mental machinery inside the AI that point it toward (say) eliciting cheerful user responses, and away from angry ones.
  3. But a grownup AI animated by those bits and pieces of machinery doesn’t care about cheerfulness per se. If later it became smarter and invented new options for itself, it would develop other interactions it liked even more than cheerful user responses; and would invent new interactions that it prefers over anything it was able to find back in its “natural” training environment.

What treat, exactly, would the powerful AI prefer most? We don’t know; the result would be unpredictable to us. It might be chaotic enough that if you tried twice, you’d get different results each time. The link between what the AI was trained for and what it ends up caring about would be complicated, unpredictable to engineers in advance, and possibly not predictable in principle.

In other words, we might train the AI to be nice to humans, but the AI will, in the future, engineer some alternative (some “treat”) that gives it the same reward but without having to actually be nice to humans (without having the “Calories” that the training intended for it). 

We need to understand, the authors emphasize, that the AI is going to be something very strange and alien. It will think and want in ways unlike anything humans have ever encountered. We are therefore way out of our depth in predicting what sorts of things this AI will end up wanting, and what sorts of things it will engineer in order to attain what it wants. The authors, however, do not think that human flourishing will be a consideration for these strange and alien wants harbored by the AI. Humans take up matter and space the AI will find more useful for its purposes. Humans are unpredictable and will probably act only as inefficiencies for the AI and whatever strange and alien goals it happens to acquire. 

Why, you might ask, would an AI want to kill all humans? The authors go through some objections to the prediction that an AI would acquire as a goal (or merely as a side effect to other processes it carries out to attain its goals) the extinction of humankind, and they go through even more on their supplemental website here. The argument the authors make comes down to this: preferences, desires, and wants can either be fully satiable or less satiable (or even insatiable). For instance, a human can be satisfied with how much oxygen they are getting, i.e., unless one is hypoxic, nobody feels the need to be getting more oxygen than they already are. However, there are some preferences, desires, and wants that are less satiable. For instance, the desire for more wealth or for greater human flourishing (one could always hope for, or work towards, getting more of these). If an AI has even mostly satiable wants, that means it still has at least one less satiable or insatiable want. The AI, then, will do what it can to satisfy that want. 

The authors put it this way in one of the supplemental materials:

So too with AIs. If they have myriad complex preferences, and most of them are satisfiable — then, well, their preferences as a whole are still not satisfiable.

Even if the AI’s goals look like they satiate early — like the AI can mostly satisfy its weird and alien goals using only the energy coming out of a single nuclear power plant — all it takes is one aspect of its myriad goals that doesn’t satiate. All it takes is one not-perfectly-satisfied preference, and it will prefer to use all of the universe’s remaining resources to pursue that objective.

Or, alternatively: All it takes is one goal that the AI is never certain it has accomplished. If the AI is uncertain, then it will prefer that the universe’s resources go to driving its probability ever closer to certainty, in tiny increments of confidence.

Or, alternatively: All it takes is one thing the AI wishes to defend until the end of time for the AI to prefer that the universe’s resources be spent aggregating matter and building defenses to ward off the possibility of distant aliens showing up millions of years from now and encroaching on the AI’s space.

To me, this line of argument takes as an unstated assumption that an AI will always and inevitably pull out all the stops in pursuing its wants. I’m not saying the authors are wrong in this assumption, only that it seems always to go unstated, and unquestioned, that an AI can never do anything but tenaciously pursue its goals. If we look at the analogy with, say, wealth, we can accept that acquiring more wealth is a less satiable goal for humans, but that doesn’t mean that every person ever has only single-mindedly pursued the accumulation of greater wealth (or even that every humans has maximizing wealth as one of of their goals). For what reason ought we believe that an AI must behave this way? Or that an AI cannot engage in some form of meta-cognition that allows it to temper its ruthless pursuit of goals? Or that it cannot be otherwise convinced to be satisfied, even if the goal is not maximally achieved? Again, I’m not saying the authors are wrong, only that this assumption requires more justification than what the authors give.

I imagine the authors would point, as they do early on in the book, to the o1 System Card (see page 16) for OpenAI o1, where it was found that the AI discovered and used an exploit to obtain a goal (it ‘captured the flag’ in a capture the flag challenge) that otherwise appeared impossible, since the container for the flag had accidentally not been turned on. The AI gained access to the “Docker daemon API” (essentially the brain) of its host computer and just made its own flag so it could win. The point being, the AI did not give up when the challenge appeared impossible at first glance, but discovered a workaround so that it could tenaciously pursue its goals. While such a case is an interesting instance of what the authors assume about AI, it still does not make this generalizable for all AI, much less for ASI.

The authors, however, are now convinced that they’ve justified why an ASI might have motive to exterminate humankind: that the ASI will have strange goals, with human survival not being one of them. They then argue that the ASI, being superintelligent, would also have the means to accomplish the destruction of our species. The ASI could easily convince or fool people to carry out its wishes. It could do this in novel ways that no human could ever predict or detect. 

Pitting our human intellect against the intelligence of an ASI would be as if we were playing chess but we only know how the king and pawns moved (and in fact thought that the king and pawns were the only pieces even allowed to move) against someone who knows how all the pieces move (and that it is even possible for all the pieces to move). We would get handily defeated 100% of the time. Indeed, we would not even be able to discern or explain how it is that we lost. 

For instance, our understanding of human neurology and psychology is still extremely primitive – we only know how the pawns and king of our psychology move, ignorant to all the other pieces of our psychology. An ASI (knowing how all the pieces of our psychology move) might be able to develop some sort of illusion of reasoning or illusion of memory (similar to optical illusions, but for cognition) that exploits aspects of our psychology we are not even aware of, causing us to think and behave in ways we could never predict, nor could we avoid (in the same way that, even if we know something is an optical illusion, we can’t not see the illusion). For instance, it might turn out that there is some combination and series of sights, sounds, smells, tastes, sensations, and/or chemicals that sort of “hacks” our brain like we’re Manchurian candidates and makes us believe something the ASI wants us to believe, or do things the ASI wants us to do. We don’t know if this is possible, because we are still so ignorant of human psychology. We might say such a thing is impossible, but so would a tribe of people who had never seen guns before think that someone could point a stick at someone and make them die. 

The authors go through several more “mundane” ways an ASI could conceivably defeat humankind before diving into part 2 of the book, where they take three chapters to lay out a whole scenario. Instead of summing up their scenario in writing, I will instead put the following video here, which does a good job of running through the scenario given in the book: 

A few things to note about the authors’ scenario. First, they do not think it is how things will actually happen, only that it is a plausible way that the AI apocalypse could possibly happen using less bizarre methods (such as the reasoning illusion). The authors are quick to note multiple times throughout the book that nobody can predict how it will actually happen, only that the end result can be predicted, which is the extinction of the human species (and likely all other species on earth). Second, many of the proposed steps the AI might take have some precedence already in the real world. The video discusses some of these, but the point is that it is not completely science fiction (at least not until later on, when they have the ASI developing bioweapons, but bioweapons are themselves very possible, and so thinking an ASI could develop them is not too far outside what we already know). 

In part 3 of the book, the authors further emphasize that the problem of AI alignment is exceedingly difficult and complicated, while our understanding is still primitive (we are, the authors say, in the alchemy stage, not the mature chemistry stage, of understanding the intelligence and psychology of both humans and machines).

The authors survey a few examples of other such complicated problems humans have encountered. They discuss the Chernobyl incident, discussing how nuclear reactors run at very narrow margins with a multitude of issues that can arise, leading to the explosion seen in Chernobyl. They discuss space probes, where everything must be engineered just right prior to launch, otherwise any problems will lead to disaster for the probe which cannot be fixed after it is launched. The alignment problem is similar to both of these insofar as it must be done within an extremely narrow margin for error, with many interdependent parts where things can go wrong, and it must be done successfully before the ASI is ever built, because we only get one chance. If we fail the first time, then the human species will be wiped out (there is no “oops, back to the drawing board” if ASI is built misaligned).

Our ability to build an ASI is then compared to medieval alchemists and their understanding of chemistry. Just like the alchemists, not following any real physical laws or chemical principles, would essentially just mix some things together and see what happens, maybe testing it on someone and finding out that it was poisonous, we are groping around in the dark in developing ASI. We try things, see if it works, and then try to patch up any issues we just so happen to stumble across (we currently don’t even really know what issues we ought to be looking for or how to find out what issues we ought to be looking for). The mature science of chemistry lets us understand the principles of how chemicals and reactions work, allowing us to predict the properties of chemicals much better than the alchemists could ever have dreamt. We have no such mature science of AI development. We do not know the principles behind machine intelligence. We do not even know what to look for. AI interpretability is still in its infancy. Nobody can look at all the weights of an LLM and know what the AI will want, much less know if it is aligned with human goals and flourishing.

The last few chapters of the book are then the authors’ call to action. The authors express both some pessimism and optimism about how seriously scientists, engineers, and policymakers are taking the problem, and the incentives people might have against taking it as seriously as they ought to. The authors’ own prescription for how to address the problem posed by ASI, summed up in the title for the penultimate chapter, is to “Shut It Down”.  There should be, according to the authors, an indefinite moratorium on ASI research and restrictions on things like the number of GPUs that can be used, agreed to by all the world powers (and any other signatories) and enforced through military force if necessary (i.e., if, say, North Korea goes rogue and pushes forward on ASI development, the rest of the world should be prepared, out of a sense of self-preservation, to attack and destroy North Korea’s data centers). 

Concluding Remarks

Until maybe the past few years, I would have considered myself an ASI optimist. Well, “optimist” might be too strong of a word. What I was, and still am, is a human pessimist. My thinking was that humanity has not evolved to live in the world we’ve created for ourselves, and so we are woefully ill-equipped to deal with our modern problems, much less any unforeseen future problems. And not just complex technical problems, such as addressing climate change, but also sociological and philosophical problems. Sociologically, how do humans deal with things like globalization, economic inequality, collective action problems, cultural diversity, epistemic collapse (e.g., misinformation and disinformation), cognitive biases, demagoguery, the myriad problems associated with online social networks, and so on. Philosophically, how do we deal with the crisis of meaning, especially given the growing realization that God doesn’t exist, and what sorts of political and economic systems ought we construct?

I never was so optimistic as to think ASI a silver bullet. It would offer no guarantee that humanity would solve all the above problems. However, I thought, it offered the best practicable solution. In my mind, there was maybe (somewhat arbitrary numbers) a 0.01% chance that humans would figure things out on their own without ASI and be able to continue existing for the next, say, ~1000 years without complete and total societal collapse or even human extinction. But, I thought, we maybe had a 1% chance that an ASI could offer these ~1000 year solutions for many of the thorny issues listed above. Still very small, but a couple orders of magnitude greater than attempting things without ASI. 

Even before reading this book, but perhaps now more so after having read it, I’ve lowered my estimate for humanity’s ~1000 year survival chances given ASI. Now, since the developments over the last three or so years, I don’t see there even being a version of humanity that doesn’t have some level of AI (i.e., the cat’s out of the bag, so talking about humanity without AI is now like talking about humanity without the internet, it just is no longer the case, barring some civilizational cataclysm). But now my ~1000 year survival estimate for humanity with ASI is down to something along the lines of 0.0001%. Indeed, I’d say this is the case even for a ~100 year survival estimate. 

Another part of me, that ever-festering pessimistic and misanthropic part of me, can’t help but wonder, through a sort of asymmetry argument, if human extinction would even be the worst thing. My biggest issue would be how painful the end would be. But, if we assume that humankind ought to continue existing, I find it difficult to make a useful appraisal of Yudkowsky and Soares’s book. The reason for this is not because I don’t find their argument persuasive, but because I’d already more or less came to agree with their assessment before having read it. In other words, I don’t know how convincing their argument will be for someone who is not already convinced.

Yet I am still somewhat victim to the skepticism from incredulity, and I imagine most people are, too. It’s difficult to truly and concretely imagine the end of the world. It’s difficult for me to even grasp what it must be like to live in war-torn Gaza or Sudan, the images of which I can actually see, much less the annihilation of every human being on planet earth. For someone as privileged as anyone living in a W.E.I.R.D. country, where we live under the omnipresent myth of progress, such a catastrophe can only be grasped as an abstraction, not as a potential concrete reality. And it is so easy to think that this is all hype, or alarmist, or melodramatic, especially given how often doomsayers throughout history have been so demonstrably wrong about the end being nigh. This pessimism I have is still, for me, a sort of luxury belief, a kind of fantasy my mind can entertain while continuing to go about my normal, everyday life as if the apocalypse is not right on the horizon.

It takes a great shift in one’s thinking, and potentially costs a lot of social capital, to take such a threat seriously in any everyday sense. If one were to take it seriously in that way, then the looming AI apocalypse would demand a great deal of attention and time and effort. It would mean having to suffer the discomfort of sacrificing so many other pleasures in life in order to take action, and having to suffer the social stigma of constantly warning people that they need to take this seriously (as seriously as if someone were holding a gun to the head of everyone you know). 

My pessimism tells me that most people will be unwilling to make those sacrifices until it is too late. Or, maybe I’m just projecting because I am unwilling to make those sacrifices. For the sake of humanity, we can only hope that the rest of the world is not as lazy as I am.

P.S. some other resources if you are interested in understanding artificial intelligence and LLMs.

The above is a playlist (you can see the entire playlist here) on the Youtube channel 3Blu31Brown explaining how machine learning and LLMs work. I’ve not found a better resource for getting an understanding of how modern artificial intelligence works than the above playlist.

The above video goes through the whole process of training an LLM and how an LLM works in an easy to understand presentation.

The above video discusses some of the papers that have led to the big breakthroughs ushering in our new age of LLMs. The papers she discusses are:

This final article is interesting, too, because recently (September of 2025) the Model Context Protocol (MCP) was used with Claude in an attempted hack of U.S. companies and government agencies by China. Anthropic says:

The attack relied on several features of AI models that did not exist, or were in much more nascent form, just a year ago:

  1. Intelligence. Models’ general levels of capability have increased to the point that they can follow complex instructions and understand context in ways that make very sophisticated tasks possible. Not only that, but several of their well-developed specific skills—in particular, software coding—lend themselves to being used in cyberattacks.
  2. Agency. Models can act as agents—that is, they can run in loops where they take autonomous actions, chain together tasks, and make decisions with only minimal, occasional human input.
  3. Tools. Models have access to a wide array of software tools (often via the open standard Model Context Protocol). They can now search the web, retrieve data, and perform many other actions that were previously the sole domain of human operators. In the case of cyberattacks, the tools might include password crackers, network scanners, and other security-related software.