PyMML is a Python package for statistical analysis and automatic classification of data. It implements a selection of the MML estimators described by Professor Chris Wallace in the book "Statistical and Inductive Inference by Minimum Message Length", including estimators for hidden factor analysis and mixture modelling.
PyMML is open source software, and uses the GPL license.
Example:
>>> data = array([[9.42,9.20], [1.47,1.15], [5.45,5.85], [7.09,7.04], [9.15,9.39], ... [30.26,0.53], [30.84,0.82], [30.21,0.32], [30.53,0.52], [30.78,0.45]]) >>> mml.Mixture_estimate(data, ( mml.Hidden_factor_estimate, ... array([[-100.0,100.0],[-100.0,100.0]]), array([0.01,0.01]), array([100.0,100.0]) )) Mixture_estimate: 151.95 nits 50% Hidden_factor_estimate: 64.15 nits mean=[ 30.52 0.53] sigma=[ 0.29 0.18] has_factor=False factor_loads=[ 0. 0.] 50% Hidden_factor_estimate: 79.76 nits mean=[ 6.52 6.53] sigma=[ 0.28 0.29] has_factor=True factor_loads=[ 3.24 3.34] factor_scores=[ 0.84 -1.57 -0.26 0.16 0.83 2.75 2.88 2.71 2.79 2.82]
Documentation:
Download:
PyMML uses the numpy and scipy packages.
If you prefer numarray, use PyMML-0.5.tar.gz. Thanks Loki Davison for the port to numpy.
A variety of MML software written in C and FORTRAN may also be downloaded from here.
The book "Statistical and Inductive Inference by Minimum Message Length" describes the theory behind MML in detail.
PyMML was written on behalf of the School of Computer Science and Software Engineering at Monash University, under the supervision of Dr. David Albrecht.