Complex reality, simple theories

A number of people have commented that my Cats and Dogs theory is simplistic. I agree. People undoubtedly have more than one dimension to their character. However, I argue this does not mean I should immediately make my theory more complex.

• I am moderately certain that the cat-dog axis is the primary axis of character. Though there may be other factors, they are less important than cat-dog (the next most important axis would probably be introversion-extroversion, but I'm not certain).
• I don't understand these other dimensions. I do not have an accurate model for them, and can not make predictions based on them beyond saying that people will tend to act as they have acted in the past. For example, I would expect someone I had observed not to be talkative in the past to continue not to be talkative in future, calling this "introversion" is simply a label. A label can be useful, but it doesn't make any surprising predictions. Wheras by looking at the cat-dog dimension, I can make predictions about how well two people will get along who have never met, and other such useful things.

Here's a graphical example. Some data (red dots) and two models of that data (lines):

Suppose we are modelling a phenomenon represented by the red dots. The dots follow the complicated curve "y=x+sin(x)", but we don't know that.

I've given two possible models for this curve that we might come up with from looking at it: "y=x", "y=x+cos(x)". The "y=x" model is obviously inadequate, the reality is very obviously more complex than a straight line. However, that does not mean that making the model more complicated is a good thing. The model "y=x+cos(x)" is more complicated, and even has a similar look to the data, but the discrepancy between it and the data is greater than that of "y=x". The simple model will be more accurate than the complex model when trying to predict new data points.

There may be a complex model that fits the data better than a simple model can, but finding such a complex model is really, really, really hard. For every extra bit of model complexity, the search space is doubled. Almost every attempt we make to improve a model by making it more complex will fail. Looking at complex models should thus only be attempted when all possible simpler models have been examined.

Thus I prefer simple models, even when it is obvious that reality is complex.

 [æ]