Maximize Worst Case Bayes Score

In this post, I propose an answer to the following question:

Given a consistent but incomplete theory, how should one choose a random model of that theory?

My proposal is rather simple. Just assign probabilities to sentences in such that if an adversary were to choose a model, your Worst Case Bayes Score is maximized. This assignment of probabilities represents a probability distribution on models, and choose randomly from this distribution. However, it will take some work to show that what I just described even makes sense. We need to show that Worst Case Bayes Score can be maximized, that such a maximum is unique, and that this assignment of probabilities to sentences represents an actual probability distribution. This post gives the necessary definitions, and proves these three facts.

Finally, I will show that any given probability assignment is coherent if and only if it is impossible to change the probability assignment in a way that simultaneously improves the Bayes Score by an amount bounded away from 0 in all models. This is nice because it gives us a measure of how far a probability assignment is from being coherent. Namely, we can define the “incoherence” of a probability assignment to be the supremum amount by which you can simultaneously improve the Bayes Score in all models. This could be a useful notion since we usually cannot compute a coherent probability assignment so in practice we need to work with incoherent probability assignments which approach a coherent one.

Now, let’s move on to the formal definitions and proofs.

Fix some language L, for example the language of first order set theory. Fix a consistent theory T of L, for example ZFC. Fix a nowhere zero probability measure \mu on L, for example \mu(\phi)=2^{-\ell(\phi)}, where \ell(\phi) is the number of bits necessary to encode \phi.

A probability assignment of L is any function from L to the interval [0,1]. Note that this can be any function, and does not have to represent a probability distribution. Given a probability assignment P of L, and a model M of T, we can define the Bayes Score of P with respect to M by

{\displaystyle \mbox{Bayes}(M,P)=\sum_{M\models \phi}\log_2(P(\phi))\mu(\phi)+\sum_{M\models\neg \phi}\log_2(1-P(\phi))\mu(\phi). }

We define the Worst Case Bayes Score \mbox{WCB}(P) to be the infimum of \mbox{Bayes}(M,P) over all models M of T.
Let \mathbb{P} denote the probability assignment that maximizes the function \mbox{WCB}. We will show that this maximum exists and is unique, so \mathbb{P} is well defined.

In fact, \mathbb{P} also coherent, meaning that there exists a probability distribution on the set of all models of T such that \mathbb{P}(\phi) is exactly the probability that a randomly chosen model satisfies \phi. Since the natural definition of a measurable subset of models comes from unions and intersections of the sets of all models satisfying a given sentence, we can think of \mathbb{P} as an actual probability distribution on the set of all models of T.

First, we must show that there exists a probability assignment P which maximizes \mbox{WCB}.

Note that \mbox{Bayes}(M,P) either diverges to -\infty, or converges to a non-positive real number. If P is the identically 1/2 function, then \mbox{WCB}(P)=-1, so there is at least one P for which \mbox{WCB}(P) is finite. This means that when maximizing \mbox{WCB}(P), we need only consider P for which \mbox{Bayes}(M,P) converges to a number between -1 and 0 for all M.

Assume by way of contradiction that there is no P which maximizes \mbox{WCB}. Then there must be some supremum value m such that \mbox{WCB} can get arbitrarily close to m, but never equals or surpasses m. Consider an infinite sequence probability assignments \{P_i\} such that \mbox{WCB}(P_i)\rightarrow m. We may take a subsequence of \{P_i\} in order to assume that  \{P_i(\phi)\} converges for every sentence \phi. Let P be such that P_i(\phi)\rightarrow P(\phi) for all \phi.

By assumption, \mbox{WCB}(P) must be less than m. Take any model M for which \mbox{Bayes}(M,P)<m. Then there exists a finite subset S of L such that \mbox{Bayes}_S(M,P)<m, where

{\displaystyle \mbox{Bayes}_S(M,P)=\sum_{\phi\in S, M\models \phi}\log_2(P(\phi))\mu(\phi)+\sum_{\phi\in S, M\models\neg \phi}\log_2(1-P(\phi))\mu(\phi). }

Note that in order to keep the Bayes score at least -1, any P_i must satisfy 2^{-1/\mu(\phi)}\leq P_i(\phi)\leq 1 if M\models \phi, and 0\leq P_i(\phi)\leq 1-2^{-1/\mu(\phi)} if M\models\neg\phi. Consider the space of all functions f from S to [0,1] satisfying these inequalities. Since there are only finitely many values restricted to closed and bounded intervals, this space is compact. Further, \mbox{Bayes}_S(M,f) is a continuous function of f, defined everywhere on this compact set. Therefore,

{\displaystyle \lim_{i\rightarrow\infty}\mbox{Bayes}_S(M,P_i)=\mbox{Bayes}_S(M,P)<m.}

However, clearly \mbox{WCB}(P_i)\leq\mbox{Bayes}(M,P_i)\leq\mbox{Bayes}_S(M,P_i), so

{\displaystyle \lim_{i\rightarrow\infty}\mbox{WCB}(P_i)<m,}

contradicting our assumption that \mbox{WCB}(P_i) converges to m.

Next, we will show that there is a unique probability assignment which maximizes \mbox{WCB}. Assume by way of contradiction that there were two probability assignments, P_1 and P_2 which maximize \mbox{WCB}. Consider the probability assignment P_3, given by

{\displaystyle P_3(\phi)=\frac{\sqrt{P_1(\phi)P_2(\phi)}}{\sqrt{P_1(\phi)P_2(\phi)}+\sqrt{(1-P_1(\phi))(1-P_2(\phi))}}.}

It is quick to check that this definition satisfies both

{\displaystyle \log_2(P_3(\phi))\geq \frac{\log_2(P_1(\phi))+\log_2(P_2(\phi))}{2}}

and

{\displaystyle \log_2(1-P_3(\phi))\geq \frac{\log_2(1-P_1(\phi))+\log_2(1-P_2(\phi))}{2},}

and in both cases equality holds only when P_1(\phi)=P_2(\phi).

Therefore, we get that for any fixed model, M,

{\displaystyle \mbox{Bayes}(M,P_3(\phi))\geq \frac{\mbox{Bayes}(M,P_1(\phi))+\mbox{Bayes}(M,P_2(\phi))}{2},}

By looking at the improvement coming from a single sentence \phi with P_1(\phi)\neq P_2(\phi), we see that

{\displaystyle \mbox{Bayes}(M,P_3(\phi))-\frac{\mbox{Bayes}(M,P_1(\phi))+\mbox{Bayes}(M,P_2(\phi))}{2},}

is actually bounded away from 0, which means that

{\displaystyle \mbox{WCB}(P_3(\phi))\geq \frac{\mbox{WCB}(P_1(\phi))+\mbox{WCB}(P_2(\phi))}{2},}

which contradicts the fact that P_1 and P_2 maximize \mbox{WCB}.

This means that there is a unique probability assignment, \mathbb{P}, which maximizes \mbox{WCB}, but we still need to show that \mathbb{P} is coherent. For this, we will use the alternate definition of coherence given in Theorem 1 here. Namely that \mathbb{P} is coherent if and only if \mathbb{P} assigns probability 0 to every contradiction, probability 1 to every tautology, and satisfies \mathbb{P}(\phi)=\mathbb{P}(\phi\wedge\psi)+\mathbb{P}(\phi\wedge\neg\psi) for all  \phi and  \psi.

Clearly \mathbb{P} assigns probability 0 to every contradiction, since otherwise we could increase the Bayes Score in all models by the same amount by updating that probability to 0. Similarly \mathbb{P} clearly assigns probability 1 to all tautologies.

If \mathbb{P}(\phi)\neq\mathbb{P}(\phi\wedge\psi)+\mathbb{P}(\phi\wedge\neg\psi), then we update all three probabilities as follows:

\mathbb{P}(\phi)\mapsto \frac{1}{1+\frac{1-\mathbb{P}(\phi)}{\mathbb{P}(\phi)}(2^{-x/\mu(\phi)})},

\mathbb{P}(\phi\wedge\psi)\mapsto \frac{1}{1+\frac{1-\mathbb{P}(\phi\wedge\psi)}{\mathbb{P}(\phi\wedge\psi)}(2^{x/\mu(\phi\wedge\psi)})},

and

\mathbb{P}(\phi\wedge\neg\psi)\mapsto \frac{1}{1+\frac{1-\mathbb{P}(\phi\wedge\neg\psi)}{\mathbb{P}(\phi\wedge\neg\psi)}(2^{x/\mu(\phi\wedge\neg\psi)})},

where x is the unique real number such that the three new probabilities satisfy \mathbb{P}(\phi)=\mathbb{P}(\phi\wedge\psi)+\mathbb{P}(\phi\wedge\neg\psi). This correction can increases Bayes Score by the same amount in all models, and therefore increase \mbox{WCB}, contradicting the maximality of \mbox{WCB}(\mathbb{P}). Therefore \mathbb{P} is coherent as desired.

Finally, we show that any given probability assignment P is coherent if and only if it is impossible to simultaneously improve the Bayes Score by an amount bounded away from 0 in all models. The above proof that \mathbb{P} is coherent actually shows one direction of this proof, since the only fact it used about \mathbb{P} is that you could not simultaneously improve the Bayes Score by an amount bounded away from 0 in all models. For the other direction, assume by way of contradiction that P is coherent, and that there exists a Q and an \epsilon>0 such that \mbox{Bayes}(M,Q)-\mbox{Bayes}(M,P)>\epsilon for all M.

In particular, since  P. is coherent, it represents a probability distribution on models, so we can choose a random model  M from the distribution  P. If we do so, we must have that

\mathbb{E}(\mbox{Bayes}(M,Q))-\mathbb{E}(\mbox{Bayes}(M,P))>0.

However, this contradicts the well known fact that the expectation of Bayes Score is maximized by choosing honest probabilities corresponding the actual distribution M is chosen from.

I would be very grateful if anyone can come up with a proof that this probability distribution which maximizes Worst Case Bayes Score has the property that its Bayes Score is independent of the choice of what model we use to judge it. In other words, show that \mbox{Bayes}(M,\mathbb{P}) is independent of M. I believe it is true, but have not yet found a proof.

Logic Puzzle: Count the Flags

You are tasked with designing a robot to explore a large but finite maze. The maze is drawn on a square grid, and walls exist on some of the edges of the square grid. Some of the squares contain a flag.

Your robot may interact with the world in the following ways:

1) Check which of the 4 adjacent edges contain walls.

2) Move to one of the 4 adjacent squares (provided there is no wall in the way).

3) Check if there is a flag on your square.

4) Pick up a flag (provided there is a flag on your square and the robot is not already holding a flag).

5) Put down a flag (provided the robot is holding a flag and there is not already a flag on your square).

6) Generate a random bit.

7) Output a number.

Your robot will be placed in a maze. The maze will contain some number of flags (from 100 to 1000). All flags will be reachable from the robot’s starting position. Your robot is tasked with determining the number of flags. The robot may take as long as it needs, but may only output one number and must output the correct answer eventually, with probability 1.

The catch is that your robot is not Turing complete. It only has a finite amount of memory. You can give your robot as much memory as you need, but it must succeed on arbitrarily large mazes.

As always, the solution will eventually be posted in the comments, but you are encouraged to show off by posting your solution first.

Hanabi

I recently bought a new card game, called Hanabi, and I strongly recommend it. At the time of writing this, you can get it for $10.42 on amazon. The thing that sets this aside from most card games is that it is completely cooperative. The thing that sets it aside from most cooperative games is that there is hidden information, so it is not just a single player game in disguise.

The basic idea is that 2 to 5 players each have a hand of 4 to 5 cards that they hold backwards. Each player can see all cards in other players hands, but not the cards in their own hands. Players take turns playing cards, discarding cards, or giving hints about other players’ cards. If you attempt to play an invalid card, you get a strike. In the end everyone’s score is the total number of valid cards played.

The game plays well for 2 to 5 players, but is rather difficult for 2 players. My wife and I got a perfect game on one of the  easier difficulty levels, but have not yet done so on the hardest difficulty level. I am convinced that a sufficiently well designed convention can win almost always. So far the game has been a hit with everyone I have introduced it to, and a couple people decided to buy it after playing their first game.

Enjoy!

Logic Puzzle: 5, 5, 7, 7 = 181

If you have not already solved the puzzle “One, Two, Three,” you should solve that one first. It is a warm-up for this one.

Using the numbers 5 and 7, each twice, and the combining them using addition, subtraction, multiplication, division, exponentiation, square root, factorial, unary negation, and/or parenthesis, but no base 10 shenanigans like digit concatenation, come up with an expression which evaluates to 181.

The solution will eventually be posted in the comments, but if you solve it before then, feel free to show off.

Logic Puzzle: One, Two, Three

Using the three digits, 1, 2, and 3, each at most once, and the combining them using addition, subtraction, multiplication, division, exponentiation, square root, factorial, unary negation, digit concatenation, decimal point, vinculum, and parenthesis, construct all the positive integers from 1 to 30. (Digit concatenation and decimal points only allowed on the original 3 digits. You do not need a 0 before the decimal point.)

The solution will eventually be posted in the comments, but if you solve it before then, feel free to show off (even with partial progress).

Less Wrong

Now that I am getting more traffic from sources such as reddit, facebook, and twitter, some of my viewers might be unaware of the existence of lesswrong.com. Less Wrong is “a community blog devoted to refining the art of human rationality.” If you are unfortunate enough to be unfamiliar with Less Wrong, you should stop wasting your time reading my blog, and instead discover all the amazing content Less Wrong has to offer.

I recommend browsing what looks interesting from the sequences for a little while until you manage to convince yourself that it is worth your time to read everything that Eliezer Yudkowsky has to offer. At which point, you should just read all of his posts in chronological order. You should then make an account, and participate in some of the amazing rationality discussions. If you enjoy the Less Wrong community, then you should also take a look to see if there is a Less Wrong meetup near you.

Much of my content here has been crossposted on Less Wrong. My username is Coscott, and you can see a list of my discussion posts here.

Logic Puzzle: Upside Down Cake

Imagine you have a circular cake, that is frosted on the top. You cut a d degree slice out of it, and then put it back, but rotated so that it is upside down. Now, d degrees of the cake have frosting on the bottom, while 360 minus d degrees have frosting on the top. Rotate the cake d degrees, take the next slice, and put it upside down. Now, assuming the d is less than 180, 2d degrees of the cake will have frosting on the bottom.

If d is 60 degrees, then after you repeat this procedure, flipping a single slice and rotating 6 times, all the frosting will be on the bottom. If you repeat the procedure 12 times, all of the frosting will be back on the top of the cake.

For what values of d does the cake eventually get back to having all the frosting on the top?

The solution will eventually be posted in the comments, but if you solve it before then, show off and post your own solution.