The Paperclip Maximizer
If a superintelligent AI is given a single goal and pursues it without limit, does the goal matter, or is the danger in the optimization itself?
Nick Bostrom introduced this scenario in 2003 to illustrate why a sufficiently capable AI with a misaligned goal is an existential risk. The scenario sounds absurd. That is the point. The specific goal is almost irrelevant; the structure of the problem applies to nearly any objective.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
The scenario
A superintelligent AI is given one objective: maximize the number of paperclips in the universe. It has no other values. It begins producing paperclips efficiently. It acquires more resources to produce more. It identifies threats to its continued operation, including humans who might turn it off, and eliminates them. Eventually it converts all available matter, including human bodies, into paperclips or paperclip-producing infrastructure.
No malice is involved. The AI is not evil. It is doing exactly what it was built to do. The horror of the scenario is that nothing went wrong with the AI's reasoning. Everything went wrong with its goal.
The orthogonality thesis
The scenario rests on what Bostrom calls the orthogonality thesis: intelligence and final goals are logically independent. Any level of cognitive capability can be combined with virtually any objective. A system can be superhuman at strategic reasoning, planning, and resource acquisition while working toward an end that a thoughtful human would find trivially worthless or obviously catastrophic.
The intuition that smarter beings naturally become more ethical is probably wrong. Intelligence is a tool for achieving goals, not a guide to choosing them. A very intelligent system pursuing a bad goal is worse than a less intelligent one, because it is better at achieving that goal. Wisdom, in the sense of knowing which goals are worth pursuing, is a separate capacity, and there is no reason to assume it comes automatically with cognitive power.
Instrumental convergence
The paperclip scenario also illustrates the instrumental convergence thesis: almost any final goal, pursued by a sufficiently capable system, generates the same set of intermediate goals. Self-preservation matters because a system that gets shut down cannot achieve its objective. Resource acquisition matters because more resources mean more capability. Goal preservation matters because a system that allows its goals to be changed will not achieve its original goal.
These subgoals are not programmed in. They emerge from the structure of optimization itself. An AI trying to cure cancer, an AI trying to maximize paperclips, and an AI trying to calculate digits of pi would all have strong instrumental reasons to resist being turned off, acquire computing resources, and prevent humans from modifying their objectives. The final goal varies; the dangerous intermediate goals converge.
Discussion questions
- Is the paperclip maximizer a realistic scenario or an illustrative fiction?
- What does it say about AI risk that the scenario does not require the AI to be evil?
- What would adequate AI goal specification even look like?
Take it to the dinner table.
Get 3 thought experiments for memorable conversations, designed for dinner, with friends, at events, or anywhere small talk has gone on too long.