The average expected model is approximately integral(x*exp(f(x))) / intergral(exp(f(x))).
Iterate x = integral(x*f(x)) / intergral(f(x)).
Might work, might not. Wolfram's integrator doesn't like this kind of integral :-(. Still, *if* some way to calculate it could be found (to some order Taylor expansion), that could then be applied to everything.
hmm...
I suspect MML will turn out to be a way of approximating these integrals. I also suspect such approximation to be non-trivial -- the maths of MML will probably turn out to be quite an efficient way to do it.
An alternative would be to find a taylor expansion of P instead of -log(P), and integrate it only over the region in which the taylor approximation is accurate, assuming it to be elsewhere zero.
hmm...
Or the expansion of P(x)*(x-x0)^n, n at least 2 greater than the order of the taylor expansion.
Or, since this would have a singularity at x0 when you divide out that added term, P(x)*((x-x0)^n+P(x0)). Wolfram's integrator can cope with the form of integral that would result, though it is rather complex. The results can be pre-calculated in any case. This is probably the way to go. ... err, i am an idiot, of course this will not work. However P(x)*((x-x0)^2+c)^n may.
Note that MML involves 3rd order derivatives, the minimum necessary to account for any sort of bias: the Fisher information involves second order derivatives, and then the minimization of the length, which includes the Fisher information, requires a further derivative.
hmm...
Another way to do it, and a seemingly quite natural one, would be to take the taylor expansion of 1/P(x). One would then need to integrate 1/<polynomial> and x/<polynomial>. Wolfram's integrator thinks this is possible.
hmm... not so good in high dimensions though...
Or a sum-of-Gaussians. If a sum-of-Gaussians could be found, the integration becomes really easy. Also a sum-of-Gaussians can approximate the functions I want pretty well, and can potentially handle Bayes-net type correlations between parameters efficiently. Problem of course is to find a good sum efficiently. Perhaps something based on the expectation maximization loop from mixture modelling, or perhaps some series that can be shown to be convergent.
hmm... Ok, so to fit a single Gaussian to a function we fit the log of the Gaussian to the log of the curve -- pick some point and take derivatives up to the second order. The iteration with n Gaussians is then: for each Gaussian re-fit that Gaussian to the curve minus the effect of the other Gaussians (do the fit at the current centre-point of that Gaussian). ... so this is similar to my original proposed iteration, but with multiple centre-points floating around.