Occam's razor cuts bias

Why is the shortest theory to be preferred?

Firstly, of necessity, if we are to weigh a set of different theories we must assign probabilities to each theory such that those probabilities add to one. If there are an unbounded number of theories, and if we wish to assign each theory a non-zero probability, it seems the only reasonable way to assign probabilities is as a decreasing function of some measure of complexity.

So this is some justification. But even having considered the evidence, theories longer than the shortest possible will still have non-zero probability. There seems no reason to single out one theory over others as the most preferable. Better to look at a "fair and balanced" range of theories.

Turns out there is another property of the shortest theory, proven by Chris Wallace: the shortest theory will tend to be a random sample from that space of all possible theories.

This seems underwhelming at first. However, if it is a random theory, it is an un-biassed theory. There is a well known human tendency: if you have a hammer, everything looks like a nail. To find the shortest theory is a way around such human bias. And, yes, bias is really is that big a problem.

See, for example, the entire history of the human race.

It is an interesting question why we should be so biassed. Maybe it's an inevitable consequence of intelligence. Or maybe we just haven't had time to evolve more sensible minds -- even badly broken intelligence beats non-intelligence, and we've conquered the world without pausing to evolve off our rough edges.