Counterfactual Mugging
A perfect predictor asks you for $100, and explains that if the coin had landed the other way, it would have given you $10,000 only if it predicted you'd pay now. The coin has already landed. Do you pay?
Introduced by Nick Bostrom in 2009 and developed within the literature on decision theory and AI alignment, Counterfactual Mugging is designed to force a choice between causal decision theory and more exotic alternatives. It directly tests whether a rational agent should care about counterfactual scenarios that can no longer be influenced.
Bostrom, N. (2009). Counterfactual Mugging. Unpublished manuscript. Future of Humanity Institute, Oxford.
The exact setup
Omega is a perfect predictor. Before flipping a coin, Omega committed to the following policy: if the coin lands tails, Omega gives you $10,000, but only if it predicted that you would pay $100 when asked. If the coin lands heads, Omega asks you for $100.
The coin has landed heads. Omega is asking for $100. It explains the policy and confirms it has committed to it in advance. You know Omega is never wrong about its predictions.
The counterfactual branch, where the coin came up tails, is gone. Nothing you do now can affect it. Omega is simply asking for $100. Do you pay?
The causal case for not paying
Causal decision theory gives a clear answer: don't pay. Your decision now has no causal effect on anything in the tails branch. That branch is closed. Whether you hand over $100 or not, you will not receive $10,000 and never would have from this flip. Paying makes you $100 poorer. There is no causal mechanism by which your payment produces any benefit.
From a causal perspective, Omega's policy about the tails branch is irrelevant. What matters is what your choices cause to happen. This choice causes you to lose $100 and nothing else.
Why paying might be the better policy
Evidential decision theory responds differently. Your willingness to pay is the same disposition that Omega predicted before the flip. If you are the type of agent who would pay when asked, then in the tails branch Omega predicted this correctly and gave you $10,000. If you are the type who wouldn't pay, Omega predicted that too and gave you nothing.
Across repeated applications of this scenario, agents who pay average a substantial gain. Agents who don't pay average a loss of nothing but also gain nothing in the favorable branch. The decision procedure that says "pay" does better in expectation.
Timeless decision theory and functional decision theory, developed largely in AI alignment research, generalize this: rational agents should evaluate their choices as policies across all situations where agents with the same decision procedure face this problem, not just as one-off causal interventions. Under these frameworks, paying is correct because "agents like you" fare better across the full distribution of scenarios.
What it reveals about how to evaluate decision procedures
Counterfactual Mugging is engineered to separate causal rationality from policy rationality. A causal reasoner looks at the immediate situation and sees no benefit to paying. A policy reasoner looks at what kind of agent to be and sees that being the type who pays leads to better outcomes overall.
The deeper question the scenario raises is: what is a decision theory for? If it's for making individual choices that maximize outcomes given fixed circumstances, causal decision theory performs well. If it's for designing the decision-making dispositions of an agent that will face many scenarios over time, possibly including scenarios with predictors, the case for policy-level reasoning becomes much stronger.
This is why Counterfactual Mugging has attracted particular interest in AI alignment. An artificial agent with the wrong decision procedure might systematically lose in Omega-like scenarios that reward certain dispositions over others. The puzzle is not just philosophical. It's about what kind of reasoner you build.
Discussion questions
- If Omega told you this scenario and asked for $100, would you pay?
- Is there something irrational about refusing to pay, or something irrational about paying?
- Does being the kind of person who would pay seem like a virtue or a vulnerability?
Take it to the dinner table.
Get 3 thought experiments for memorable conversations, designed for dinner, with friends, at events, or anywhere small talk has gone on too long.