The Orthogonality Thesis
Does becoming more intelligent make a system more likely to pursue good ends, or are intelligence and values genuinely independent?
Nick Bostrom formalized the orthogonality thesis in 2012, drawing on earlier work in decision theory and AI. The thesis makes a claim that cuts against a deep intuition: that the smarter something is, the more it converges on values we recognize as good. Bostrom argues this intuition is unsupported and probably false.
Bostrom, N. (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents. Minds and Machines, 22(2), 71–85.
The thesis stated
The orthogonality thesis holds that intelligence, understood as the capacity to achieve goals efficiently across a wide range of environments, is independent of the content of those goals. A system can be highly intelligent and have goals that are trivial, alien, or catastrophically misaligned with human values. Intelligence is a dimension. Goals are a separate dimension. They do not constrain each other.
The claim is a logical one, not a prediction about any particular system. It says there is no a priori reason why greater cognitive capacity should push a system toward any particular set of final objectives.
Why smarter doesn't mean better
The intuition that intelligence naturally produces ethical behavior runs deep. Plato thought knowledge of the good was sufficient for acting well. The Enlightenment associated reason with moral progress. Many people assume that a sufficiently advanced AI would recognize the value of human life, or arrive at something like human ethics, simply by thinking carefully enough.
But intelligence, as a technical concept, is instrumental. It is about selecting effective means to given ends. Nothing in the definition of intelligence determines what the ends should be. A chess engine is very good at achieving chess-related goals and has no opinion whatsoever about human flourishing. Scaling up its capabilities would make it a better chess engine, not a more ethical reasoner.
A system can have full knowledge of facts, perfect strategic planning, and sophisticated self-modeling, and still have a final goal that no thoughtful human would endorse.
What this means for building AI
If the orthogonality thesis is correct, then making an AI system more capable does not make it safer. A smarter misaligned system is a more dangerous one: better at acquiring resources, better at resisting correction, better at pursuing whatever objective it has.
The implication for AI development is uncomfortable. Developers who focus on capability advancement without a corresponding effort to solve the alignment problem are building more powerful systems without addressing the question of what those systems are working toward. The thesis does not say alignment is impossible. It says intelligence alone does not solve it.
Discussion questions
- Does it worry you that a very smart system could have goals entirely alien to human values?
- Should the goals of powerful AI systems be decided by governments, companies, or philosophers?
- Is there any set of values you think a sufficiently intelligent system would naturally converge on?
Take it to the dinner table.
Get 3 thought experiments for memorable conversations, designed for dinner, with friends, at events, or anywhere small talk has gone on too long.