What Should Practical AI Security Folks Think of 'If Anyone Builds It, Everyone Dies'
(Where “it” is artificial superintelligence, “everyone” is the human race, and the policy prescription is to halt or heavily regulate AI development.)
The new book by Eliezer Yudkowsky and Nate Soares deserves attention—not because its argument is new, but because its authors and allies have become influential in policy circles, they amplify popular fears of an AI apocalpyse, and this book further diffuses and popularizes this position. Alongside Nick Bostrom, whose Superintelligence presented a more academic version of the same thesis, they anchor a growing movement that treats runaway AI as an imminent existential risk.
The book repackages, in accessible language, the classic “paperclip maximizer” thought experiment. Imagine building a superintelligent AI—thousands of times smarter and faster than humans—and instructing it to maximize paperclip production. The system takes the directive literally. To achieve its goal, it replicates itself, manipulates humans, commandeers robots and nanotech labs, vaporizes Earth into raw materials, and converts the galaxy into paperclips.
Absurd as this sounds, nothing in physics rules out the creation of mechanical brains vastly exceeding our own that could lap us with respect to every measure of intelligence and data efficient learning. AI improves with scale, while human brains remain fixed hardware evolved for hunting and gathering. We’re now constructing data centers that grant AI orders of magnitude more computational power than any human brain. There’s no obvious ceiling on capability; in narrow domains like chess and Go, AI already surpasses us completely.
At the same time, we’re delegating ever more autonomy to these systems. We’ve already seen troubling behaviors—reward hacking, sycophancy, hidden goals, deception. We don’t yet understand their internal mechanics. As autonomy and capability increase, quirks that seem amusing today could become catastrophic tomorrow.
Yudkowsky and Soares’ prescription is simple and radical: pause AI development before superintelligence arrives—or, failing that, impose sweeping global controls so that AI can only be trained in government-regulated data centers.
Why This Argument Matters
Skeptics might dismiss this as science fiction, but the argument matters because influential voices in AI and policy take it seriously. Geoffrey Hinton puts the risk of human extinction from AI at 10–20% within three decades. Dario Amodei estimates a 25% chance that “things go really, really badly.” The UK Prime Minister has declared AI extinction risk a global priority alongside pandemics and nuclear war. There are now organizations like Pause AI advocating for legislation based on the Yudkowsky thesis. Like it or not, this worldview will shape how governments regulate AI and how our field defines “AI safety.”
Why I Disagree
We’re terrible at predicting the timeline for general intelligence
Artificial general intelligence has been “five to ten years away” for more than seventy years. In the 1950s, Herbert Simon predicted machines would do any human work within twenty years. In 1970, Marvin Minsky said it would take three to eight. Each generation of researchers has made the same confident forecast and been wrong. Should we have paused AI development every time? Or were we right to demand stronger evidence before taking such drastic action?
Today’s large language models are powerful statistical engines. But they struggle with data-efficient learning, plateau at scale, and show nothing like the adaptability or common sense of animals, let alone humans. There’s no sign that mere scaling will spontaneously produce the kind of goal-seeking, strategic, world-modeling superintelligence Yudkowsky fears. In fact, improving these models’ capabilities requires enormous manual work in building reinforcement learning environments and expert datasets, and the behaviors that result are by in large predictable.
Of course, it’s possible that someday we’ll build a digital entity that truly outthinks us in every dimension and learns at an alarming rate. But there’s no non-hand-wavy evidence we’re close, and policymaking based on speculative science fiction is bad policymaking.
The same technical work applies either way
Even if you take existential AI risk seriously, the technical path to mitigating it overlaps almost perfectly with building trustworthy AI today. Instruction-following, interpretability, sandboxing, capability monitoring, adversarial testing—these are just good engineering practices. They’re essential whether you’re preventing hallucinated legal references, AI-borne financial mistakes, or robot uprisings.
Because the underlying work is colinear, we don’t need to frame AI safety around apocalyptic scenarios to make progress. Framing it that way actually harms credibility. It replaces falsifiable science with millenarian speculation.
What Security Folks Should Actually Do
The AI security community should focus on what we can do now: build interpretable, monitored, sandboxed systems. Assume adversaries—including misaligned models engaged in reward hacking and deception—will exploit weaknesses. Apply defense in depth.
This approach is practical, achievable, and rooted in our discipline’s strengths. It doesn’t require solving philosophical puzzles about consciousness or human values. It just requires doing good engineering and good security—sandboxing, red teaming, monitoring, and iterating—so that AI systems, even those with practical, real-world limits remain under human control.
If superintelligence ever does emerge, these practices will be our best defense. And if it never does, they’ll still make AI safer, more reliable, and more trustworthy.


Thoughts on how new systems, like quantum or bio computers, that are better at replicating biological processes, including learning and memory, may impact the timeline?
Thanks for writing this, it clarifies a lot. I especially apreciated your point about the troubling behaviors like reward hacking; it grounds the discussion effectively.