On Voting for Third Parties

If your favorite candidate in an election is a third party candidate, should you vote for him?

This question has confused me. I have changed my mind many times, and I have recently changed my mind again. I would like to talk about some of the arguments in both directions and explain the reason for my most recent change.

Con 1) Voting for a third party is throwing your vote away.

We have all heard this argument before, and it is true. It is an unfortunate consequence of the plurality voting system. Plurality is horrible and there are all better alternatives, but it is what we are stuck with for now. If you vote for a third party, the same candidate would be elected as if you did not vote at all.

Pro 1) The probability that you vote changes the election is negligible. All your vote does is add one to the number of people who voted for a given candidate. Your vote for the third party candidate therefore matters more because it is changing a small number by relatively more.

This argument is actually an empirical claim, and I am not sure how well it holds up. It is easy to study the likelihood that you vote changes the election. One study finds that it roughly varies from 10^-7 to 10^-11 in America for presidential elections. However, it is not clear to me just how much your vote affects the strategies of political candidates and voters in the future.

Pro 2) The probability that your vote changes the election or future elections is negligible. The primary personal benefit for voting is the personal satisfaction of voting. This personal satisfaction is maximized by voting for the candidate you agree with the most.

I think that many people if given the choice between changing the next president between the two primary parties or being paid an amount of money equal to the product of the amount of gas they spent to drive to vote and 10^7 would take the money. I am not one of them but any of those people must agree that voting is a bad investment if you do not consider the personal satisfaction. However, I think I might get more satisfaction out of doing my best to change the election, rather than placing a vote that does not matter.

Con 2) Actually if you use a reflexive decision theory, you are much more likely to change the election, so you should vote like it matters.

Looking at the problem like a timeless decision agent, you see that your choice on voting is probably correlated with that of many other people. You voting for a primary party is logically linked with other people voting for a primary party, and those people whose votes are logically linked with yours are more likely to agree with you politically. This could bring the chance of changing the election out of the negligible zone, where you should be deciding based on political consequences.

Pro 3) Your morality should encourage you to vote honestly.

It is not clear to me that I should view a vote for my favorite candidate as an honest vote. If we used the anti-plurality system where the person with the least votes wins, then a vote for my favorite candidate would clearly not be considered an honest one. The “honest” vote should be the vote that you think will maximize your preferences which might be a vote for a primary party.

Pro 4) Strategic voting is like defecting in the prisoner’s dilemma. If we all cooperate and vote honestly, we will get the favorite candidate of the largest number of people. If not, then we could end up with someone much worse.

The problem with this is that if we all vote honestly, we get the plurality winner, and the plurality winner is probably not all that great a choice. The obvious voting strategy is not the only problem with plurality. Plurality also discourages compromise, and the results of plurality are changed drastically by honest vote splitting. The plurality candidate is not a good enough goal that I think we should all cooperate to achieve it.

I have decided that in the next election, I will vote for a primary party candidate. I changed my mind almost a year ago after reading Stop Voting for Nincompoops, but after recent further reflection, I have changed my mind back. I believe that Con 1 is valid, Con 2 and the other criticisms above adequately respond to Pro 1 and Pro 2, and I believe that Pro 3 and Pro 4 are invalid for the reasons described above. I would love to hear any opinions on any of these arguments, and would love even more to hear arguments I have not thought of yet.

Even Odds

Let’s say that you are are at your local less wrong meet up and someone makes some strong claim and seems very sure of himself, “blah blah blah resurrected blah blah alicorn princess blah blah 99 percent sure.” You think he is probably correct, you estimate a 67 percent  chance, but you think he is way over confident. “Wanna bet?” You ask.

“Sure,” he responds, and you both check your wallets and have 25 dollars each. “Okay,” he says, “now you pick some betting odds, and I’ll choose which side I want to pick.”

“That’s crazy,” you say, “I am going to pick the odds so that I cannot be taken advantage of, which means that I will be indifferent between which of the two options you pick, which means that I will expect to gain 0 dollars from this transaction. I wont take it. It is not fair!”

“Okay,” he says, annoyed with you. “We will both write down the probability we think that I am correct, average them together, and that will determine the betting odds. We’ll bet as much as we can of our 25 dollars with those odds.”

“What do you mean by ‘average’ I can think of at least four possibilities. Also, since I know your probability is high, I will just choose a high probability that is still less than it to maximize the odds in my favor regardless of my actual belief. Your proposition is not strategy proof.”

“Fine, what do you suggest?”

You take out some paper, solve some differential equations, and explain how the bet should go.

Satisfied with your math, you share your probability, he puts 13.28 on the table, and you put 2.72 on the table.

“Now what?” He asks.

A third meet up member takes quickly takes the 16 dollars from the table and answers, “You wait.”

I will now derive a general algorithm for determining a bet from two probabilities and a maximum amount of money that people are willing to bet. This algorithm is both strategy proof and fair. The solution turns out to be simple, so if you like, you can skip to the last paragraph, and use it next time you want to make a friendly bet. If you want to try to derive the solution on your own, you might want to stop reading now.

First, we have to be clear about what we mean by strategy proof and fair. “Strategy proof” is clear. Our algorithm should ensure that neither person believes that they can increase their expected profit by lying about their probabilities. “Fair” will be a little harder to define. There is more than one way to define “fair” in this context, but there is one way which I think is probably the best. When the players make the bet, they both will expect to make some profit. They will not both be correct, but they will both believe they are expected to make profit. I claim the bet is fair if both players expect to make the same profit on average.

Now, let’s formalize the problem:

Alice believes S is true with probability p. Bob believes S is false with probability q. Both players are willing to bet up to d dollars. Without loss of generality, assume p+q>1. Our betting algorithm will output a dollar amount, f(p,q), for Alice to put on the table and a dollar amount, g(p,q) for Bob to put on the table. Then if S is true, Alice gets all the money, and if S is false, Bob gets all the money.

From Alice’s point of view, her expected profit for Alice will be p(g(p,q))+(1-p)(-f(p,q)).

From Bob’s point of view, his expected profit for Bob will be q(f(p,q))+(1-q)(-g(p,q)).

Setting these two values equal, and simplifying, we get that (1+p-q)g(p,q)=(1+q-p)f(p,q), which is the condition that the betting algorithm is fair.

For convenience of notation, we will define h(p,q) by h(p,q)=g(p,q)/(1+q-p)=f(p,q)/(1+p-q).

Now, we want to look at what will happen if Alice lies about her probability. If instead of saying p, Alice were to say that her probability was r, then her expected profit would be p(g(r,q))+(1-p)(-f(r,q)), which equals

p(1+q-r)h(r,q)+(1-p)(-(1+r-q)h(r,q))=(2p-1-r+q)h(r,q)

We want this value as a function of r to be maximized when r=p, which means that

-h+(2p-1-r+q)\frac{dh}{dr}=0,

which since r=p is the same as

-h+(-1+r+q)\frac{dh}{dr}=0.

Separation of variables gives us

\int\frac{1}{h}dh=\int\frac{1}{-1+r+q}dr,

which integrates to

ln(h)=C+ln(-1+r+q),

which simplifies to

h(r,q)=e^C(-1+r+q),

or equivalently,

h(p,q)=e^C(-1+p+q).

This gives the solution

f(p,q)=e^C(-1+p+q)(1+p-q)=e^C(p^2-(1-q)^2)

and

g(p,q)=e^C(-1+p+q)(1+q-p)=e^C(q^2-(1-p)^2).

It is quick to verify that this solution is actually fair, and both players’ expected profit is maximized by honest reporting of beliefs.

The value of the constant multiplied out in front can be anything, and the most either player could ever have to put on the table is equal to this constant. Therefore, if both players are willing to bet up to d dollars, we should define

e^C=d.

Alice and Bob are willing to bet up to d dollars, Alice thinks S is true with probability p, and Bob thinks S is false with probability q. Assuming p+q>1, Alice should put in d(p^2-(1-q)^2), while Bob should put in d(q^2-(1-p)^2). I suggest you use this algorithm next time you want to have a friendly wager (with a rational person), and I suggest you set d to 25 dollars and require both players to say an odd integer percent to ensure a whole number of cents.

Subjective Altruism

Let us assume for the purpose of this argument that Bayesian probabilities are subjective. Specifically, I am thinking in terms of the model of probability expressed in model 4 here. That is to say that the meaning of P(A)=2/3 is “I as a decision agent care twice as much about possible world in which A is true relative to the possible world in which A is false.” Let us also assume that it is possible to have preferences about things that we will never observe.

Consider the following thought experiment:

Alice and Bob are agents who disagree about the fairness of a coin. Alice believes that the coin will come up heads with probability 2/3 and Bob believes the coin will come up tails with probability 2/3. They discuss their reasons for a long time and realize that their disagreement comes from different initial prior assumptions, and they agree that both people have rational probabilities given their respective priors. Alice is given the opportunity to gamble on behalf of Bob. Alice must call heads or tails, then the coin will be flipped once. If Alice calls the coin correctly, then Bob will be given a dollar. If she calls the coin incorrectly, then nothing happens. Either way, nobody sees the result of the coin flip, and Alice and Bob never interact again. Should Alice call heads or tails?

The meat of this question is this: When trying to being altruistic towards a person, should you maximize their expected utility under their priors or your own. I will present an argument here, but feel free to stop reading here and think about it on your own and post the results.

First of all, notice there there are actually 3 options:

1) Maximize your own expectation of Bob’s utility

2) Maximize Bob’s expectation of his utility

3) Maximize what Bob’s expectation of his utility would be if he were to update on the evidence of everything that you have observed.

At first it may have looked like the main options were 1 and 2, but I claim that 2 is actually a very bad option and the only question is between options 1 and 3. Option 2 is stupid because for example it would cause Alice to call tails even if she has already seen the coin flip and it came up heads. There is no reason for Alice not to update on all of the information she has. The only question is whose prior should she update from. In this specific thought experiment, we are assuming that 2 and 3 are the same, since Alice has already convinced herself that her observations could not change Bob’s mind, but I think that as a general options 1 and 3 are somewhat reasonable answers, while 2 is not.

Option 3 has the nice property that it does not have to observe Bob’s utility function. It only has to observe Bob’s expected utility of different choices. This is nice because in many ways “expected utility” seems like a more fundamental and possibly more well defined concept than “utility.” We are trying to be altruistic towards Bob. It seems natural to give Bob the most utility in the possible worlds that he “cares about” the most.

On the other hand, we want the possible worlds that we care about most to be as good as possible. We may not ever be able to observe whether of not Bob gets the dollar, but it is not just Bob who wants to get the dollar. We also want Bob to get the dollar. We want Bob to get the dollar in the most important possible worlds, the worlds we assign a high probability to. What we want is for Bob to be happy in the worlds that are important. We may have subjectively assigned those possible worlds to be the most important ones, but from the standpoint of us as a decision agent, the worlds we assign high probability to really are more important than the other ones.

Option 1 is also simpler than option 3. We just have a variable for Bob’s utility in our utility function, and we do what maximizes our expected utility. If we took option 3, we would be maximize something that is not just a product of our utilities with our probabilities.

Option 3 has some unfortunate consequences. For example, it might cause us to pray for a religious person even if we are very strongly atheist.

I prefer option 1. I care about the worlds that are simple and therefore are given high probability. I want everyone to be happy in those worlds. I would not sacrifice the happiness in someone in a simple/probable/important world just because someone else thinks another world is important. Probability may be subjective, but relative to the probabilities that I use to make all my decisions, Bob’s probabilities are just wrong.

Option 3 is nice in situations where Alice and Bob will continue interacting, possibly even interacting through mutual simulation. If Alice and Bob were given a symmetric scenario, then this would become a prisoner dilemma, where Alice choosing heads corresponds to defecting, while Alice choosing tails corresponds to cooperating. However, I believe this is a separate issue.

The Belief Signaling Trilemma

One major issue for group rationality is signaling with dishonest beliefs. Beliefs are used as a signal to preserve your reputation or to show membership in a group. It happens subconsciously, and I believe it is the cause of most of the issues of both religions and political parties. It also happens on Less Wrong, most commonly though not sharing beliefs that you think people will disagree with.

First, let’s identify the problem. This is mostly from the viewpoint of ideal agents as opposed to actual humans. You are part of a community of rationalists. You discuss lots of issues, and you become familiar with the other members of your community. As a result, you start to learn which members of the community are the smartest. Of course, your measure of the intelligence of the members is biased towards people who say things that you agree with. The members who say things that you agree with build up more reputation in your mind. This reputation makes you more likely to trust other things that this person says. I am also a member of this community. I have opinions on many things, but there is one issue that I think really does not matter at all. On this issue, most of the community believes X, but I believe not X. By signaling belief in X, I increase my reputation in the community, and will cause other people to take more seriously my views on other issues I think are more important. Therefore, I choose to signal belief in X.

What is happening here is that:

(A) People are assigning reputation based on claims, and judging claims based partially on other beliefs signaled by the same person.

(B) People want their claims are taken seriously, and so take actions which will preserve and improve their reputation.

Therefore,

(C) People take signal beliefs that they believe are false because they are shared by the community.

Signaling honest beliefs is kind of like cooperating in a prisoners dilemma. It helps the community push towards reaching what you believe are valid conclusions, at the cost of your own reputation. It is possible for us to decide as a community that we want to cooperate, especially with tools such as anti-kibitzer. However, there is more than one way to do that. I think there are three options. I think they are all theoretically possible, but I think they are all bad.

(1) We can agree to stop assigning reputation based on beliefs.

This option is bad because there is a loss of information. People who made the right choice on one issue are more likely to make the right choice on other issues, and we are ignoring this correlation.

(2) We can agree to always report honest beliefs even though we know it will cost us reputation.

This option is bad because it encourages self-deception. If you commit to honestly report beliefs, and you can gain more reputation by reporting belief in X, you may trick yourself into thinking that you believe X.

(3) We can allow dishonest reporting of beliefs to continue.

This option is bad because it causes a bias. The community will get a source of evidence biased towards their current beliefs.

Which option do you prefer? Am I missing a fourth option? Is one of the choices obviously the best or obviously the worst? Should we combine them somehow? Am I modeling the problem entirely wrong?

Alternative to Bayesian Score

I am starting to wonder whether or not Bayes Score is what I want to maximize for my epistemic rationality. I started thinking about by trying to design a board game to teach calibration of probabilities, so I will use that as my example:

I wanted a scoring mechanism which motivates honest reporting of probabilities and rewards players who are better calibrated. For simplicity, lets assume that we only have to deal with true/false questions for now. A player is given a question which they believe is true with probability p. They then name a real number x between 0 and 1. Then, they receive a score which is a function of x and whether or not the problem is true. We want the expected score to be maximized exactly when x=p. Let f(x) be the output if the question is true, and let g(x) be the output if the question is false. Then, my expected utility is (p)f(x)+(1-p)g(x). If we assume f and g are smooth, then in order to have a maximum at x=p, we want (p)f'(p)+(1-p)g'(p)=0, which still leaves us with a large class of functions. It would also be nice to have symmetry by having f(x)=g(1-x). If we further require this, we get (p)f'(p)+(1-p)(-1)f'(1-p)=0, or equivalently (p)f'(p)=(1-p)f'(1-p). One way to achieve this is to set (x)f'(x) to be a constant So then, f'(x)=c/x, so f(x)=log x. This scoring mechanism is referred to as “Bayesian Score”.

However, another natural way to to achieve this is by setting f'(p)/(1-p) equal to a constant. If we set this constant equal to 2, we get f'(x)=2-2x, which gives us f(x)=2x-x2=1-(1-x)2. I will call this the “Squared Error Score.”

There are many other functions which satisfy the desired conditions, but these two are the simplest, so I will focus on these two.

Eliezer argues for Bayesian Score in A Technical Explanation of Technical Explanation, which I recommend reading. The reason he prefers Bayesian Score is that he wants the sum of the scores associated with determining P(A) and P(B|A) to equal the score for determining P(A&B). In other words he wants it to not matter whether you break a problem up into one experiment or two experiments. This is a legitimate virtue of this scoring mechanism, but I think that many people think it is a lot more valuable than it is. This doesn’t eliminate the problem of we don’t know what questions to ask. It gives us the same answer regardless of how we break up an experiment into smaller experiments, but our score is still dependent on what questions are asked, and this cannot be fixed by just saying, “Ask all questions.” There are infinitely many of them. The sum does not converge. Because the score is still a function of what questions are asked, the fact that it gives the same answer for some related sets of questions is not a huge benefit.

One nice thing about the Squared Error Score is that it always gives a score between 0 and 1, which means we can actually use it in real life. For example, we could ask someone to construct a spinner that comes up either true or false, and then spin it twice. They win if either of the two spins comes up with the true answer. In this case, the best strategy is to assign probability p to true. There is no way to do anything similar for the Bayesian Score, in fact it is questionable whether or not arbitrary low utilities even make sense.

The Bayesian Score is slightly easier to generalize to multiple choice questions. The Squared Error Score can also be generalized, but it unfortunately has to make your score a function not only of the probability you assigned to the correct solution. For example, If A is the correct answer, you get more points for 80%A 10%B 10%C than from 80%A 20%B 0%C. The function you want for multiple values is if you assign probabilities x1, through xn, and the first option is correct you get output 2x1-x12-x22-…-xn2. I do not think this is as bad as it seems. It kind of makes sense that when the answer is A, you get penalized slightly for saying that you are much more confident in B than in C, since making such a claim is a waste of information. To view this as a spinner, you construct a spinner, spin it twice, and you win if either spin gets the correct answer, or if the first spin comes lexicographically strictly before the second spin.

For the purpose of my calibration game, I will almost certainly use Squared Error Scoring, because log is not feasible. But it got me thinking about why I am not thinking in terms of Squared Error Score in real life.

You might ask what is the experimental difference between the two, since they are both maximized by honest probabilities. Well If I have two questions and I want to maximize my (possibly weighted) average score, and I have a limited amount of time to research and improve my answers for them, then it matters how much the scoring mechanism penalizes various errors. Bayesian Scoring penalizes so much for being sure of one false thing that none of the other scores really matter, while Squared Error is much more forgiving. If we normalize to say that 50/50 gives 0 points while true certainty gives 1 points, then Squared Error gives -3 points for false certainty while Bayesian gives negative infinity.

I view maximizing Bayesian Score as the Golden Rule of epistemic rationality, so even a small chance that something else might be better is worth investigating. Even if you are fully committed to Bayesian Score, I would love to hear any pros or cons you can think of in either direction.

How should Eliezer and Nick’s extra $20 be split

In “Principles of Disagreement,” Eliezer Yudkowsky shared the following anecdote:

Nick Bostrom and I once took a taxi and split the fare.   When we counted the money we’d assembled to pay the driver, we found an extra twenty there.

“I’m pretty sure this twenty isn’t mine,” said Nick.

“I’d have been sure that it wasn’t mine either,” I said.

“You just take it,” said Nick.

“No, you just take it,” I said.

We looked at each other, and we knew what we had to do.

“To the best of your ability to say at this point, what would have been your initial probability that the bill was yours?” I said.

“Fifteen percent,” said Nick.

“I would have said twenty percent,” I said.

I have left off the ending to give everyone a chance to think about this problem for themselves. How would you have split the twenty?

In general, EY and NB disagree about who deserves the twenty. EY believes that EY deserves it with probability p, while NB believes that EY deserves it with probability q. They decide to give EY a fraction of the twenty equal to f(p,q). What should the function f be?

In our example, p=1/5 and q=17/20

Please think about this problem a little before reading on, so that we do not miss out on any original solutions that you might have come up with.


I can think of 4 ways to solve this problem. I am attributing answers to the person who first proposed that dollar amount, but my reasoning might not reflect their reasoning.

  1. f=p/(1+p-q) or $11.43 (Eliezer Yodkowsky/Nick Bostrom) — EY believes he deserves p of the money, while NB believes he deserves 1-q. They should therefore be given money in a ratio of p:1-q.
  2. f=(p+q)/2 or $10.50 (Marcello) — It seems reasonable to assume that there is a 50% chance that EY reasoned properly and a 50% chance that NB reasoned properly, so we should take the average of the amounts of money that EY would get under these two assumptions.
  3. f=sqrt(pq)/(sqrt(pq)+sqrt((1-p)(1-q))) or $10.87 (GreedyAlgorithm) — We want to chose an f so that log(f/(1-f)) is the average of log(p/(1-p)) and log(q/(1-q)).
  4. f=pq/(pq+(1-p)(1-q)) or $11.72 — We have two observations that EY deserves the money with probability p and probability q respectively. If we assume that these are two independent pieces of evidence as to whether or not EY should get the money, then starting with equal likelihood of each person deserving the money, we should do a Bayesian update for each piece of information.

I am very curious about this question, so if you have any opinions, please comment. I have some opinions on this problem, but to avoid biasing anyone, I will save them for the comments. I am actually more interested in the following question. I believe that the two will have the same answer, but if anyone disagrees, let me know.

I have two hypotheses, A and B. I assign probability p to A and probability q to B. I later find out that A and B are equivalent. I then update to assign the probability g(p,q) to both hypotheses. What should the function g be?