Why did Chris Wallace use a scale-free prior for hidden factor analysis?

We may never know, of course. But for other priors that might be fat tailed (eg number of classes) he tended to use an exponential prior.

To be precise (though omitting a normalization constant and a scaling factor),

h(b) = (1+|b|^2)^(-(K+1)/2)

where b is a hidden factor and K is the dimension of the data. Since b is a K-vector, the prior on |b| (ignoring that annoying +1, which is only important for small |b|) is about

h(|b|) = |b|^(-(K+1)/(K-1))

which may or may not be important. (It's just a little higher than is necessary to keep the prior normalizable. As the dimension increases the prior gets a fatter and fatter tail.)

Anyway. Consider a multiple hidden factor problem. Chris Wallace was expressing an expectation that such problems would commonly conform to this prior. Even in a single problem, there might be a mix of short factors and long factors conforming roughly to this prior.

So a problem will tend to have this mix of big hidden factors and small hidden factors. Or it might have several clusters, each with its own set of factors.

Maybe this is a partial explanation of how power laws arise: a high dimensional problem space, with different key factors that vary greatly in scale. Projection of this high dimensional space down onto lower dimensions (eg just looking at magnitude) would yield the familiar power law distributions.

An intelligent exploration of such a problem might make jumps of attention conformant to such hidden factors: a power law distribution of jump sizes, but really the jump size determined by the direction of the jump. The smarter the exploration, the better the match, and conversely a ham-fisted explorer might completely miss a thin ridge. (Again, a projection of the problem space into lower dimensions might make the attention jump direction look random.)

Anyway, it would be interesting to know why Chris chose this prior. Presumably mathematical elegance played some part.