Surprise is the KL-divergence between prior and posterior


Jiří pointed me at this interesting paper on surprise and attention: Itti and Baldi, 2005, "Bayesian Surprise Attracts Human Attention", with free software available here. Their immediate application of this was to eye-tracking, which immeditately piqued my interest.

Itti and Baldi define surprise as the Kullback-Leibler divergence between the distribution of model hypothesese before and after observing a datum.

I have previously heard surprise defined as the amount of information a datum contains, but this definition is suprerior.

This definition of surprise (and attention) is principled, general, and they have presented evidence that it matches well with human behaviour. It is very nifty indeed.

My own immediate interest is seeing if I can adapt their software to be autistic (or perhaps non-autistic), then hopefully comparing it to published eye-tracking studies of autism. (One slight drawback here -- their model, though allowing for the possibility in theory, does not yet implement higher level cognitive processes. This is of course A.I. complete, but would be necessary in order to perfectly model human attention, as human attention is guided by higher level processes. I am thinking in particular of a study that did eye-tracking on people watching the movie "Who's Afraid of Virginia Woolf", in which attention (in normal people) was partly directed by what the actors were saying or looking at.)

... I wonder if it would be better to swap the order of parameters for the KL-divergence from what Itti and Baldi were using. What one really wants is an estimate of the cost of holding your previous erroneous world view, having now updated your beliefs based on the new data point.