In this post, I propose an answer to the following question:

Given a consistent but incomplete theory, how should one choose a random model of that theory?

My proposal is rather simple. Just assign probabilities to sentences in such that if an adversary were to choose a model, your Worst Case Bayes Score is maximized. This assignment of probabilities represents a probability distribution on models, and choose randomly from this distribution. However, it will take some work to show that what I just described even makes sense. We need to show that Worst Case Bayes Score can be maximized, that such a maximum is unique, and that this assignment of probabilities to sentences represents an actual probability distribution. This post gives the necessary definitions, and proves these three facts.

Finally, I will show that any given probability assignment is coherent if and only if it is impossible to change the probability assignment in a way that simultaneously improves the Bayes Score by an amount bounded away from 0 in all models. This is nice because it gives us a measure of how far a probability assignment is from being coherent. Namely, we can define the “incoherence” of a probability assignment to be the supremum amount by which you can simultaneously improve the Bayes Score in all models. This could be a useful notion since we usually cannot compute a coherent probability assignment so in practice we need to work with incoherent probability assignments which approach a coherent one.

Now, let’s move on to the formal definitions and proofs.

Fix some language , for example the language of first order set theory. Fix a consistent theory of , for example ZFC. Fix a nowhere zero probability measure on , for example , where is the number of bits necessary to encode .

A probability assignment of is any function from to the interval . Note that this can be any function, and does not have to represent a probability distribution. Given a probability assignment of , and a model of , we can define the Bayes Score of with respect to by

We define the Worst Case Bayes Score to be the infimum of over all models of .

Let denote the probability assignment that maximizes the function . We will show that this maximum exists and is unique, so is well defined.

In fact, also coherent, meaning that there exists a probability distribution on the set of all models of such that is exactly the probability that a randomly chosen model satisfies . Since the natural definition of a measurable subset of models comes from unions and intersections of the sets of all models satisfying a given sentence, we can think of as an actual probability distribution on the set of all models of .

First, we must show that there exists a probability assignment which maximizes .

Note that either diverges to , or converges to a non-positive real number. If is the identically function, then , so there is at least one for which is finite. This means that when maximizing , we need only consider for which converges to a number between and for all .

Assume by way of contradiction that there is no which maximizes . Then there must be some supremum value such that can get arbitrarily close to , but never equals or surpasses . Consider an infinite sequence probability assignments such that . We may take a subsequence of in order to assume that converges for every sentence . Let be such that for all .

By assumption, must be less than . Take any model for which . Then there exists a finite subset of such that , where

Note that in order to keep the Bayes score at least , any must satisfy if , and if . Consider the space of all functions from to satisfying these inequalities. Since there are only finitely many values restricted to closed and bounded intervals, this space is compact. Further, is a continuous function of , defined everywhere on this compact set. Therefore,

However, clearly , so

contradicting our assumption that converges to .

Next, we will show that there is a unique probability assignment which maximizes . Assume by way of contradiction that there were two probability assignments, and which maximize . Consider the probability assignment , given by

It is quick to check that this definition satisfies both

and

and in both cases equality holds only when

Therefore, we get that for any fixed model, ,

By looking at the improvement coming from a single sentence with we see that

is actually bounded away from , which means that

which contradicts the fact that and maximize .

This means that there is a unique probability assignment, , which maximizes , but we still need to show that is coherent. For this, we will use the alternate definition of coherence given in Theorem 1 here. Namely that is coherent if and only if assigns probability 0 to every contradiction, probability 1 to every tautology, and satisfies for all and .

Clearly assigns probability 0 to every contradiction, since otherwise we could increase the Bayes Score in all models by the same amount by updating that probability to 0. Similarly clearly assigns probability 1 to all tautologies.

If , then we update all three probabilities as follows:

and

where is the unique real number such that the three new probabilities satisfy . This correction can increases Bayes Score by the same amount in all models, and therefore increase , contradicting the maximality of . Therefore is coherent as desired.

Finally, we show that any given probability assignment is coherent if and only if it is impossible to simultaneously improve the Bayes Score by an amount bounded away from 0 in all models. The above proof that is coherent actually shows one direction of this proof, since the only fact it used about is that you could not simultaneously improve the Bayes Score by an amount bounded away from 0 in all models. For the other direction, assume by way of contradiction that is coherent, and that there exists a and an such that for all .

In particular, since . is coherent, it represents a probability distribution on models, so we can choose a random model from the distribution . If we do so, we must have that

However, this contradicts the well known fact that the expectation of Bayes Score is maximized by choosing honest probabilities corresponding the actual distribution is chosen from.

I would be very grateful if anyone can come up with a proof that this probability distribution which maximizes Worst Case Bayes Score has the property that its Bayes Score is independent of the choice of what model we use to judge it. In other words, show that is independent of . I believe it is true, but have not yet found a proof.

Neat idea! Especially your idea for measuring incoherence.

For your last question I’m beginning to think that it isn’t true. For example what if we build a theory like this: go through L in order of decreasing μ, and each time we reach a statement undecided by our theory so far we add it or its negation according to which is more likely under P. I suspect that this theory will give P a good Bayes score, since it agrees with P “where it counts” on statements with high μ.

On the other hand, at first I thought that maybe *no* model achieved the worst case, and that the infimum was only an infimum. This definitely isn’t true. To find a model that achieves the worst case, take a sequence of models whose Bayes scores tend to the infimum and for each φ pass to an infinite subsequence where the truth value of that φ is constant.

The dependence on μ is a bit distressing in general. Maybe we could set μ proportional to c^(-l(φ)) and then take a limit as c goes to 1, hoping we can force P to converge somehow?

By the way you have a typo in the third paragraph from bottom. “Bayes(M,Q)-Bayes(M,Q)” should have a “P” replacing one of the “Q”s.

Thanks, fixed the typo.

The conjecture is at least true when the language is finite. It is also the case that the set of models for which Bayes score is near WCB is an open and everywhere dense set of all models. I think I can prove but have not checked formally that P achieves its worst case score with probability 1 (with respect to its own probability distribution, so if P does not have constant Bayes score, it is at least really close to having constant Bayes score.

I have not yet though about taking a limit as c goes to 1. I would guess that it is not going to converge, but if it did, that would be very nice. I will think about that more.

There’s a question I forgot to ask:

If we add an additional axiom to our theory, does P change according to Bayes rule?

It seems pretty important for this to be true.

No, it does not. Abram Demski proposed a different solution to this problem which was to randomly sample sentences and add them to the theory if consistent, and his does not either. This makes me sad.

I think that one should start with an empty theory, find a probability distribution, and do the rest with Bayesian updates.