Newcomb’s Paradox, One-Boxing and the Marginal Utility of Money
A very intelligent artificial intelligence (AI) presents you with two boxes: box A is transparent and box B is painted with a thick coat of black paint. Box A always contain 1k USD. Box B can either contain 1M USD or nothing. Do you choose to take both boxes, or just box B? Before you select your option, there is an added complication: before you get to pick, the AI has made a very, very accurate prediction about your choice. If you decide to pick just box B, you’ll get 1M USD if the prediction was correct, and 0 if it was wrong. If you decide to pick both and the prediction was accurate, you 1k. If you pick both and the AI was wrong, you get 1M+1k.
So what do you choose? The problem might seem intuitively obvious for most people. The major problem is that a considerable proportion of people thinks it is obvious that you just take box B, whereas another sizable proportion think that you should take both boxes. How can that be? This tricky situation is known as Newcomb’s paradox (or problem) and was initially put forward by philosopher Robert Nozick in the late 1960s.
The standard payoff matrix
So let us create a 2×2 payoff matrix showing what amount of money you get from choosing the different options together with the different predictions made by the AI. Making a payoff matrix is an effective way to get a general overview of the consequences of different combinations of choices in game theory (or in everyday life). Both this and the alternative payoff matrix below comes from the Standford Encyclopedia of Philosophy entry on Casual Decision Theory.
AI prediction One-boxing Two-boxing --------------|-------------|--------------| Your One-boxing | 1M | 0 | Choice --------------|-------------|--------------| Two-boxing | 1M+1k | 1k | --------------|-------------|--------------|
So should you one-box (only take box B), or two-box (take both box A and box B)?
The conflict
Well, the central issue here is that the Newcomb’s paradox attempts to provoke a conflict between two decision principles.
One of them, called the dominance principle, suggests that you should two-box. This is because if the AI predicts one-boxing, two-boxing gives you 1k more than if you one-box (1M+1k versus 1M). In a similar way, if the AI predicts two-boxing, taking both boxes will give you 1k over 0 from taking only box B. So regardless of AI prediction, you should two-box since this option dominates one-boxing.
The other decision principle, called maximization of expected utility, suggests that you should one-box. This is because the predictor is very accurate, so the chance of getting 1M+1k is either very, very low and/or completely negligible. Basically, according to this decision principle, the only two options is the top left or bottom right, and since 1M is better than 1k, you should one-box.
A marginal utility solution?
So two decision theories that seem reasonable on their own contradict each other in this scenario. So how do we solve the apparent paradox? Well, one option might be to investigate if either the arguments for one-boxing or the argument for two-boxing changes if we describe the payoff matrix in a different way without changing anything about the setup. Instead of writing down the AI prediction, we just write if it is correct or not. The alternative payoff matrix becomes:
AI prediction Correct Incorrect --------------|-------------|--------------| Your One-boxing | 1M | 0 | Choice --------------|-------------|--------------| Two-boxing | 1k | 1M+1k | --------------|-------------|--------------|
But now the two-boxing dominance has suddenly vanished! If the AI is correct, one-boxing dominates, but if the AI is wrong, two-boxing dominates. But the expected utility argument does not change, of course, since the AI is almost always correct, the only two non-negligible option is 1M (one-boxing) or 1k (two-boxing). The dominance principle is thus not invariant, whereas the expected utility argument is. This suggests that one-boxing is the reasonable choice, since, surely a rational stance does not change depending on how you describe the same basic problem.
But surely is a tricky word, because it can hide a lot of unstated assumptions that very well could be disastrously wrongheaded. So is there a positive argument for one-boxing that can complement the fact that dominance is not invariant?
One such possible solution makes use of an orthodox economic concept know as marginal utility. Basically, if you have a small amount of money, a small increase in your total wealth will mean a lot. If you get 100 USD a month, another 100 USD will double your wages and since it is hard to live on 100 USD a month that additional 100 USD will be very welcome. However, if you earn 100 000 USD a month, you will probably not even notice a pay increase of another 100 USD.
How does marginal utility apply to Newcomb’s paradox? Well, it is virtually a given that you get 1M from picking only box B. There is currently no consensus on how you can trick the AI into predicting one-boxing so that your two-boxing will get you that last 1k USD. So, most likely, you will have to spend years or even decades researching this issue in order to, perhaps, fool the system (if this is even possible). Thus, this massive investment of time and thought is simply not worth the additional 1k USD if you can easily get 1M USD. The marginal utility of that extra 1k USD is just too low.
If you really, really want that extra 1k USD, you might as well invest some of that 1M USD into an index fund and wait for the payoff while you do other, more productive, things in life.
This justification for one-boxing is also attractive because it does not make a large number of speculative assumptions. It just requires that correct solutions are description invariant and that marginal utility is a reasonable addition to expected utility theory. Since both are orthodox economic theory, this seems to be the case.
Bonus round: How not to solve Newcomb’s paradox
Olle Häggström also thinks that one-boxing is the most reasonable choice. His reasoning for why he thinks this is, however, radically different from above. He quotes from a book he recently read by Scott Aaronson (pp. 296-297), which more or less applies the logic behind e. g. Bostrom’s simulation argument to Newcomb’s paradox:
Now let’s get back to the earlier question of how powerful a computer the Predictor has. Here’s you, and here’s the Predictor’s computer. Now, you could base your decision to pick one or two boxes on anything you want. You could just dredge up some childhood memory and count the letters in the name of your first-grade teacher or something and based on that, choose whether to take one or two boxes. In order to make its prediction, therefore, the Predictor has to know absolutely everything about you. It’s not possible to state a priori what aspects of you are going to be relevant in making the decision. To me, that seems to indicate that the Predictor has to solve what one might call a “you-complete” problem. In other words, it seems the Predictor needs to run a simulation of you that’s so accurate it would essentially bring into existence another copy of you.
Let’s play with that assumption. Suppose that’s the case, and that now you’re pondering whether to take one box or two boxes. You say, “all right, two boxes sounds really good to me because that’s another $1,000.” But here’s the problem: when you’re pondering this, you have no way of knowing whether you’re the “real” you, or just a simulation running in the Predictor’s computer. If you’re the simulation, and you choose both boxes, then that actually is going to affect the box contents: it will cause the Predictor not to put the million dollars in the box. And that’s why you should take just the one box.
The reason why this is an unlikely solution is that it makes a lot of unsupported assumptions.
First, it assumes the existence of libertarian free will, since it stipulates that you can choose the method you use to reach your selection based on “anything you want”. But this is surely not the case. Under determinism (whether hard determinism or compatibilism), there really is no such contra-causal freedom.
Second, it assumes that this predictor really needs to make a complete and perfect simulation of your mental world and “has to know absolutely everything about you”. This is decidedly not the case. Knowing everything is certainly enough knowledge about a person to make a very, very accurate prediction, but it is hardly necessary. This is because humans have a very narrow range of responses to a given situation. For instance, when asked to come up with a few random words, people do not spread out evenly across a dictionary. Instead, their effective word space is much, much smaller. That is why strong passphrase generation systems (such as diceware) require you to use a dice instead of just picking out dictionary words “at random”. Or put it simpler: if an important person you just met extends his or her right hand with the palm to your right, there is a very high probability that you too will take this action and shake his or her hand. This can be essentially known without making a perfect simulation that includes all knowledge about the person. Basically heuristic methods are much more likely to be used by an AI than the perfect simulation. For instance, the most powerful chess computers or GO computers do not simulate every single future move, but it can still beat you very, very often.
Third, it assumes substrate independent (i.e. that human consciousness is not fundamentally tied to a biological brain) and the computational theory of mind. These assumptions will be critically examined in a later part of the review of Häggström’s most recent book, but they are by no means obvious or scientifically demonstrated.
Fourth, it assumes the non-incoherence of solipsism. The stimulated person would face essentially an evil demon / solipsism situation of not being able to tell if he or she is “real” or “just a simulation”. However, solipsism is an incoherent philosophical stance. According to the Internet Encyclopedia of Philosophy article Solipsism and the Problem of Other Minds written by Thornton:
One might even say, solipsism is necessarily foundationless, for to make an appeal to logical rules or empirical evidence the solipsist would implicitly have to affirm the very thing that he purportedly refuses to believe: the reality of intersubjectively valid criteria and a public, extra-mental world. There is a temptation to say that solipsism is a false philosophical theory, but this is not quite strong or accurate enough. As a theory, it is incoherent.
This fourth assumption also collapses because of the failure of the second assumption. If the predictor could perform as well as it could by just using heuristics, there would be no need to make a perfect complete simulation and you would never find yourself in a situation where you might be unable to distinguish between yourself and a simulation.
Although all of these assumptions are probably not perfectly independent, the conjunction of all these assumptions make this proposed solution considerably less persuasive.
While my first reaction was to choose both boxes, I can make an argument for the one-box solution.
The question is, how accurate is the AI? How well does it understand human motivations? How well does it understand me as an individual?
Because the conditions stated “very, very accurate,” I will assume that the AI predicts I would choose the both boxes.
If that is the case, then choosing both would give me 1K and choosing one would give me 1M.
But that assumes the AI really does understand my decisions.
In some formulations, the AI (“predictor”) is defined to be 100% accurate, but because of a lot of meta-debates about free will, time travel and retrocausality that road is usually avoided. If I remember correctly, the original formulation just said that it was very, very, accurate (but not necessarily 100% accurate) or similar.
The typical proponent of the idea that Newcomb’s paradox is important (I think it is at most a curiosity and a bit trivial) would say that we can treat the predictor as a black box. It doesn’t matter how it makes it highly accurate predictions, whether it understands human psychology etc. because we can just assume that it does, somehow.
Typically, a lot of people would say the predictor has an accurate of >> 99.99%. More or less a sure thing, while not exactly 100% to avoid any potential retrocausality and so on.